QuantStack/Qwen-Image-Edit-2509-GGUF · Artifacts in output using lower quants (q2 and q4 tested)

Sep 24

Thanks for the day 1 support. I have been experimenting with this model for a while, but came into a weird issue that seems to not been encountered by others: at about 70% into the denoising process, the model starts stacking the first input image onto the output. The stacking is not exact, thus cannot be removed by simply subtracting the input image. I have tried adding and removing the 4-step lightning lora with various strength (and also different versions of the lora), swapping text encoder from q2 to q3, different step setting and scheduler, but the result seems to be the same. I have tried clothe swap and reposing task and both came out the same. ComfyUI and gguf nodes are all updated to the newest version, running without Sage Attention. Workflow is as below (metadata included). Please let me know if this is a general issue regarding these low bit quants or specific to my setup.

YarvixPA

QuantStack org Sep 24

•

edited Sep 24

You need to connect the loaded images into text encoders image inputs

Able2

Sep 24

Yes those are connected, as shown in the workflow image. The workflow is actually pretty much the same as the official workflow, only modified to load gguf models. I have tried both with and without image scalingcand one or two inputs, but the results are the same.

YarvixPA

QuantStack org Sep 24

•

edited Sep 24

send a capture of the workflow with a output

renlianfu

Sep 24

The extended part of the image generation will have color difference issues. There's also a high probability of overlap issues with the reference image. If the Karras scheduler is used, the overlap of the reference image will disappear, but the color difference will still be present. The generated images in Q8 do not have the issues mentioned above.

YarvixPA

QuantStack org Sep 24

Do you have it with the other quant versions?

Able2

Sep 24

•

edited Sep 24

Thanks for the advice on scheduler selection. I will try that later.
As of quants, I don't think I'll be able to go beyond q4 in a reasonable way even if I want to since I'm on a mid range gaming laptop. Being able to even run this 20B giant is already a blessing to me.

Edit: Can't seem to get the ghosting away even with karras scheduler when using lightning lora, plus I don't think that lora is compatible with karras since the output is much more deformed and blurry than using euler/simple. As per request, here is the workflow image and the output from that particular workflow. Note that I was trying lcm/beta combo at the time and I know the result is definitely underdeveloped, but anyways the ghosting is still there.
Workflow:

Input (2 inputs):

output:

output from qwen chat (with nearly identical prompt and identical input image):

Able2

Sep 24

Got fp8 to work with painfully slow speed and quite a large toll on my ssd(thanks multigpu nodes). However the ghosting is still there, albeit a lot less noticeable (now looks like shadows to be honest, but still kind of follows the contour of the original image). Below is the output. Same prompt and same input images with 4 step lora at strength 1, 4 steps, euler/simple. Quite interesting that no one seems to be mentioning it.

wsbagnsv1

QuantStack org Sep 24

Got fp8 to work with painfully slow speed and quite a large toll on my ssd(thanks multigpu nodes). However the ghosting is still there, albeit a lot less noticeable (now looks like shadows to be honest, but still kind of follows the contour of the original image). Below is the output. Same prompt and same input images with 4 step lora at strength 1, 4 steps, euler/simple. Quite interesting that no one seems to be mentioning it.

Tbh i didnt see this issue happening to me, could be cursed seed?

wsbagnsv1

QuantStack org Sep 24

If this is indeed an issue with q4 and less i would advise you to go higher with the quant. You can easily do that if you have enough ram, even when your vram is not enough. I for example have an rtx4070ti with 12gb and an old rtx 2070 + 32gb dd5 6600mhz which allows me to 1st load the clip model to the second gpu and keep it loaded and the mainmodel is on the cpu but calculates stuff on the 4070ti, That way i can run q8 without any real speed penalty and have really high quality (works with wan etc too). The secondary gpu is optional you can run this thing anyways with at least q5 even if you dont have that much ram by using the multi gpu custom nodes, specifically the distorch2 loaders (;

wsbagnsv1

QuantStack org Sep 24

Ill do a test run with the same settings for a bunch of quants

Q8:

Q5_K_M:

Im requantizing the old ones again, but it seems indeed to degrade a bit, you can see the leg on the q5 is not the same as q8, but q8 seems to be basically identical to qwen chat

wsbagnsv1

QuantStack org Sep 24

Q4_0 has the exact same issue as you:

and q2 is obviously even worse:

wsbagnsv1

QuantStack org Sep 24

•

edited Sep 24

So i guess there is a layer here that is very sensitive to quantization, the question is, is it a single one or the blocks from the editing part. If its the latter the quants probably would grow in size by around 1.5gb or so to fix that

wsbagnsv1

QuantStack org Sep 24

In the meantime you should probably use some higher quant 😕

Able2

Sep 24

Thanks for that. Will wait for refined quants then. Obviously I would like to run higher quants or even unquantized ones, but my hardware doesn't allow me to do so though. Currently on 6GB VRAM + 16GB RAM (yeh it is definitely a miracle that q2 runs smoothly, fp8 is definitely very torturing to my hardware).

wsbagnsv1

QuantStack org Sep 25

Thanks for that. Will wait for refined quants then. Obviously I would like to run higher quants or even unquantized ones, but my hardware doesn't allow me to do so though. Currently on 6GB VRAM + 16GB RAM (yeh it is definitely a miracle that q2 runs smoothly, fp8 is definitely very torturing to my hardware).

I tested around a bit, it its not a specific layer (and that can be very hard to find) but multiple ones the quants would increase in size by a lot, did you check out the old model? If its not in there, it might give us some clues 🤔

Able2

Sep 25

You mean the original qwen-image-edit? No, the old one does not have that kind of issue, at least no ghosting (even on q2). It is definitely a peculiar issue.

wsbagnsv1

QuantStack org Sep 25

You mean the original qwen-image-edit? No, the old one does not have that kind of issue, at least no ghosting (even on q2). It is definitely a peculiar issue.

Did you have any other case where this exact issue was happening?

Able2

Sep 25

•

edited Sep 25

Not really yet. Will try other inputs later.
Edit: Getting the rig up and running now. Will upload some examples later.

Able2

Sep 25

•

edited Sep 25

Left is the input, middle panel is the intermediate result, and right is the final output. Prompt, sampler settings, steps, lora stats, are all on top. Workflow is embedded in every image (of course with input and prompt varying between images). All of them are q2 with lightning v2 lora on 1 strength. It would be great if anyone can tryout different step counts and quants as my setup cannot do other settings in reasonable time (may be q2 at 20 steps but even that'll take quite a while, let alone other quants).

Ghosting seems to not always happen at 70%, and the one that is overlaid on top is not dependent on the order (the first one) but seems to be the one that is not a control image (like pose image or depth map). However I have not yet got multi-reference working, hence I am not able to check what else influences this behavior.
BTW, I noticed that the quantized version is not as capable as the full version. I saw someone creating a fish eye over head view of the first input image, but on my side the model seems to do nothing. Also, changing style to 3d animation with the 4th input does not work.

YarvixPA

QuantStack org Sep 25

•

edited Sep 25

Okey, but just asking that Q2_K_S is for us or calcius? There you have Q2_K_S (calcius). With our quants the results are the same is strange... in version 1 didn't happen this

Able2 changed discussion title from Artifacts in output using q2 and q4 quants to Artifacts in output using lower quants (q2 and q4 tested) Sep 25

Able2

Sep 25

Perhaps we should summarize a bit right now.
For those who just joined, currently Q2 through Q4 quants are leaving artifacts in the generated images. Please try to use Q5 and above, Q8 or the fp8 safetensors whenever possible.

rzgar

Sep 25

You can run full BF16 (40GB) on 10GB VRAM, 64GB RAM, and an i7 processor without any issues (use advanced diffusion loader). Screenshot => 3x 32-inch monitors at 1200GHz.
GGUF + Lightning LoRA 4-steps or 8-steps is a bad idea, as it slows down generation and causes ghosting issues. Clear the cache when disabling the 4-steps LoRA, run four completely different generations unrelated to your intended output, and then try again.

wsbagnsv1

QuantStack org Sep 25

You can run full BF16 (40GB) on 10GB VRAM, 64GB RAM, and an i7 processor without any issues (use advanced diffusion loader). Screenshot => 3x 32-inch monitors at 1200GHz.
GGUF + Lightning LoRA 4-steps or 8-steps is a bad idea, as it slows down generation and causes ghosting issues. Clear the cache when disabling the 4-steps LoRA, run four completely different generations unrelated to your intended output, and then try again.

Higher than q5 doesnt have the ghosting issue with ggufs (;

Able2

Sep 25

You can run full BF16 (40GB) on 10GB VRAM, 64GB RAM, and an i7 processor without any issues (use advanced diffusion loader). Screenshot => 3x 32-inch monitors at 1200GHz.
GGUF + Lightning LoRA 4-steps or 8-steps is a bad idea, as it slows down generation and causes ghosting issues. Clear the cache when disabling the 4-steps LoRA, run four completely different generations unrelated to your intended output, and then try again.

I definitely hope I have that hardware though. 6+16 is definitely not going to cut it (I did try fp8 and each step takes like 3 minutes+extensive use of the swap space)

Higher than q5 doesnt have the ghosting issue with ggufs (;

Can confirm that's the case. Though if look closely there are still a bit of silhouette left, if you're not that picky it is totally fine (will mostly be blended in pretty well).

gxbsyxh

Sep 26

I have the same problem, but I communicate in our community, because few people use gguf, so there is no way to solve it, the original image will be superimposed on the generated image to form artifacts

KulerRuler

Sep 26

I have the same problem with quant Q4_K_S, what is interesting is that quant Q4_K_M do not have this problem.

Able2

Sep 27

I have the same problem with quant Q4_K_S, what is interesting is that quant Q4_K_M do not have this problem.

Interesting. I will try that later.

dipstik

Sep 28

•

edited Sep 28

Left is the input, middle panel is the intermediate result, and right is the final output. Prompt, sampler settings, steps, lora stats, are all on top. Workflow is embedded in every image (of course with input and prompt varying between images). All of them are q2 with lightning v2 lora on 1 strength. It would be great if anyone can tryout different step counts and quants as my setup cannot do other settings in reasonable time (may be q2 at 20 steps but even that'll take quite a while, let alone other quants).

Ghosting seems to not always happen at 70%, and the one that is overlaid on top is not dependent on the order (the first one) but seems to be the one that is not a control image (like pose image or depth map). However I have not yet got multi-reference working, hence I am not able to check what else influences this behavior.
BTW, I noticed that the quantized version is not as capable as the full version. I saw someone creating a fish eye over head view of the first input image, but on my side the model seems to do nothing. Also, changing style to 3d animation with the 4th input does not work.

i like the 2 step outputs. i am doing q5 k_m with 2 steps using 4 step lightning 1.0 with res_2s and beta_57. will try some other settings and report back

A image-edit lightning 4 step 1.0 at 2 step, res_2s beta_57
B image lightning 2.0 8 step at 4 step, res_2s beta_57
C image lightning 1.1 8 step at 4 step, res_2s beta_57
D image-edit lightning 4 step 1.0 at 4 step, res_2s beta_57
E image-edit lightning 4 step 1.0 at 2 step, res_2s beta_57

E<C<B=A<D

still need a prompt to keep qwen from zooming in all the time

prompt: the the sasquatch in Picture 2 saying goodbye to the bartender in Picture 1. sasquatch carrying a closed guitar case with right hand.the sasquatch has back to camera in middle of frame. sasquatch is wearing black leather jacket. everyone happy to see eachother

input images

A image-edit lightning 4 step 1.0 at 2 step, res_2s beta_57

B image lightning 2.0 8 step at 4 step, res_2s beta_57

C image lightning 1.1 8 step at 4 step, res_2s beta_57

D imge-edit lightning 4 step 1.0 at 4 step, res_2s beta_57

E imge-edit lightning 4 step 1.0 at 2 step, res_2s beta_57

Joly0

29 days ago

Hey guys, i wanted to add on this, that i am also seeing issues with "ghosting"/"shifting"/"duplication" in the final image. For example, the input image in my workflow is:

which is downscaled by the workflow to 896x1120 pixels and then edited to change the hair color from blond to blue. Here is the output:

Her its very noticable, the woman randomly has two necklaces, but the image also has some shifting compared to the original image which can be seen here:

Here are the most important nodes probably:

here is important to mention that i am using q5_1 but i also tried with q4_k_m and got the exact same result.

I found this reddit thread https://www.reddit.com/r/StableDiffusion/comments/1myr9al/use_a_multiple_of_112_to_get_rid_of_the_zoom/ mentioning to use a multiple of 112 when scaling input images to get rid of zooming and that seems to help a little bit, but the problem still appears.

YarvixPA

QuantStack org 29 days ago

•

edited 29 days ago

here is important to mention that i am using q5_1 but i also tried with q4_k_m and got the exact same result.

I found this reddit thread https://www.reddit.com/r/StableDiffusion/comments/1myr9al/use_a_multiple_of_112_to_get_rid_of_the_zoom/ mentioning to use a multiple of 112 when scaling input images to get rid of zooming and that seems to help a little bit, but the problem still appears.

Hi @Joly0 , make sure the initial image is downscaled to one megapixel, and that this downscaled image is the one you use as input to the latent of the KSampler. That has to do with the sizes the model was trained on. I’ll give you an example with your same image but using my workflow with Q4_K_M.

Here my workflow (no embedded):

Here the input/ouput:

Joly0

28 days ago

here is important to mention that i am using q5_1 but i also tried with q4_k_m and got the exact same result.

I found this reddit thread https://www.reddit.com/r/StableDiffusion/comments/1myr9al/use_a_multiple_of_112_to_get_rid_of_the_zoom/ mentioning to use a multiple of 112 when scaling input images to get rid of zooming and that seems to help a little bit, but the problem still appears.

Hi @Joly0 , make sure the initial image is downscaled to one megapixel, and that this downscaled image is the one you use as input to the latent of the KSampler. That has to do with the sizes the model was trained on. I’ll give you an example with your same image but using my workflow with Q4_K_M.

Here my workflow (no embedded):

Here the input/ouput:

Hey @YarvixPA thank you for the response. I checked and the image is downscaled to 1MP before proceeding.

You should be able to check the workflow i am using with this image (should include the workflow as is)

Everything looks right from my side, but i might be missing something. But for me it looks like this:

Whats weird is, that the workflow includes a second workflow at the right, that looks a bit more like yours, but that works without problems. Maybe i am missing something, but i cant get it to work.

Also off-topic question: What is this "Models loader" Node? Looks useful, but i cant find it anywhere?

YarvixPA

QuantStack org 28 days ago

@Joly0In my workflows I use the ComfyUI subgraph. Basically it is a way to group several nodes, and you have the freedom to decide which input and outputs you want it to have in visible node in your workflow

It's like grouping and then when you enter that group you will see the other nodes. Useful to reduce the number of nodes and have only what is necessary. For example, in the node you saw, group the 3 charging nodes into a single one.

Joly0

28 days ago

Hey @YarvixPA i was able to fix the issue, i am not sure, where the problem exactly was, but the editing now works. The only problem left is that there is a slight zoom-in on the final output image, compared to the scaled image. I tried to rebuild your workflow 1:1 even with the same seed, etc but it is still zooming in slightly. From what i can read this appears to be an issue somewhere (some speculate its maybe the vae) but i have no idea

YarvixPA

QuantStack org 28 days ago

@Joly0 Yes, the output I share have the same thing of the zoom. This type of things are from the model, not the quant. In my opinion Kontext has better consistency, but Qwen Image is close and can improve more the image (i mean imagine the details, Kontext keep things the same as input)

I don’t know why the quants from q2 to q4 (except Q4_K_M) have the issue shared in this discussion. But i thing is something about the multi image support, because the Q2 and Q4 quants of the previous version dont have this issue of ghosthing, duplication

moonslink

28 days ago

You can run full BF16 (40GB) on 10GB VRAM, 64GB RAM, and an i7 processor without any issues (use advanced diffusion loader). Screenshot => 3x 32-inch monitors at 1200GHz.
GGUF + Lightning LoRA 4-steps or 8-steps is a bad idea, as it slows down generation and causes ghosting issues. Clear the cache when disabling the 4-steps LoRA, run four completely different generations unrelated to your intended output, and then try again.

What was the name of the node pack? I'm looking around for "Qwen Advanced Diffusion Loader" but no luck.

lhucklen

22 days ago

Is there going to be updated Quants that fix this?

artificialcx

21 days ago

You can run full BF16 (40GB) on 10GB VRAM, 64GB RAM, and an i7 processor without any issues (use advanced diffusion loader). Screenshot => 3x 32-inch monitors at 1200GHz.
GGUF + Lightning LoRA 4-steps or 8-steps is a bad idea, as it slows down generation and causes ghosting issues. Clear the cache when disabling the 4-steps LoRA, run four completely different generations unrelated to your intended output, and then try again.

I definitely hope I have that hardware though. 6+16 is definitely not going to cut it (I did try fp8 and each step takes like 3 minutes+extensive use of the swap space)

Higher than q5 doesnt have the ghosting issue with ggufs (;

Can confirm that's the case. Though if look closely there are still a bit of silhouette left, if you're not that picky it is totally fine (will mostly be blended in pretty well).

Hey rzgar, I was also looking for this node pack and haven't had much luck finding it. Could you point us in the right direction?

rzgar

21 days ago

•

edited 21 days ago

@moonslink sorry for the late reply, I didn't see it until today. @artificialcx

Hey rzgar, I was also looking for this node pack and haven't had much luck finding it. Could you point us in the right direction?

Hey artificialcx, The loader is part of custom nodes from kijai nodes (https://github.com/kijai/ComfyUI-KJNodes)

But the one I use in the screenshot is a modified version of (https://github.com/xuchenxu168/ComfyUI_Qwen-Image). for these custom nodes you need to install DiffSynth too (https://github.com/modelscope/DiffSynth-Studio), pretty simple process.
First clone the DiffSynth somewhere outside the ComfyUI directory.
Activate the ComfyUI python environment, 'cd path-to-diffsynth-studio' and run 'pip install -r requirements.txt'
after installing DiffSynth, clone the ComfyUI_Qwen-Image repo (ComfyUI\custom_nodes) and restart the ComfyUI.
Go to ComfyUI -> Templates -> ComfyUI_Qwen-Image -> qwen image standard workflow.
in case you want my modified nodes, tell me so I'll upload it and share it with u.

remember, you need to install the sage attention and triton for windows (if u r on windows, use ubuntu if u can, much smoother experience and less overhead) there are pre-compiled wheels for most CUDA and Python versions

YarvixPA

QuantStack org 21 days ago

Is there going to be updated Quants that fix this?

It’s not impossible, but it requires testing different quantization combinations to determine which tensors contain the “multi-image” data. Since the tensors are the same and no new ones were added, the quantization applied to version 2509 was the same one used for Qwen Image, and also for Qwen Image Edit, where this “ghosting” issue doesn’t appear.

YarvixPA changed discussion status to closed 21 days ago

Phil2Sat

12 days ago

did a fist test, have to redo with all models but clean image on Q2_K quant:

// first/last block high precision test
if (arch == LLM_ARCH_QWEN_IMAGE){
    if (
        (name.find("transformer_blocks.0.") != std::string::npos) ||
        (name.find("transformer_blocks.59.") != std::string::npos) // this should be dynamic
    ) {
        if (ftype == LLAMA_FTYPE_MOSTLY_Q2_K || 
            ftype == LLAMA_FTYPE_MOSTLY_Q3_K_S ||
            ftype == LLAMA_FTYPE_MOSTLY_Q3_K_M || 
            ftype == LLAMA_FTYPE_MOSTLY_Q3_K_L ||
            ftype == LLAMA_FTYPE_MOSTLY_Q4_0 ||
            ftype == LLAMA_FTYPE_MOSTLY_Q4_1 ||
            ftype == LLAMA_FTYPE_MOSTLY_Q4_K_S ||
            ftype == LLAMA_FTYPE_MOSTLY_Q4_K_M) {
            new_type = GGML_TYPE_Q5_K;  // Minimum Q5_K for low quants
        }
        else if (ftype == LLAMA_FTYPE_MOSTLY_Q5_K_M) {
            new_type = GGML_TYPE_Q6_K;
        }
    }
}

did the trick, further testing...

lhucklen

12 days ago

How?

wsbagnsv1

QuantStack org 12 days ago

Well probably update the quants with that info (;

YarvixPA

QuantStack org 12 days ago

Lower quants updated

ZKong

12 days ago

之前从第一版就发现edit的gguf Q4效果不好与官网差异很大，原来真有问题。不知道这版是不是真的彻底解决了

YarvixPA

QuantStack org 12 days ago

之前从第一版就发现edit的gguf Q4效果不好与官网差异很大，原来真有问题。不知道这版是不是真的彻底解决了

Download it, give it a try, and share your feedback. It’s normal to see degraded performance when a high-precision model (like BF16) is quantized from 16-bit down to 4-bit, 3-bit, or even 2-bit in order to reduce its size, make it fit within hardware limits, and lower VRAM usage.

This issue isn’t exclusive to GGUF it also occurs with other quantization methods.

Phil2Sat

12 days ago

even q2_k should work halfway okay and its also possible to change the min qant for the whole qants in this section to q8_0, its little bit larger but it gives another small quality boost.

YarvixPA

QuantStack org 12 days ago

That’s correct, for Qwen Image the approach of keeping the first and last blocks in high precision had been implemented to improve the outputs. The same technique was later used for Qwen Image Edit, and it worked well, but in the new version of Qwen Image Edit 2509 we knew that increasing it could possibly improve the results, although there was some doubt since it would also increase the model’s size.

But yes, I tested Q2_K and Q4_K_S, and the results are better. And now there are available in the repo

YarvixPA

QuantStack org 12 days ago

Btw: @Phil2Sat if you want to collaborate in QuantStack send me your discord account so we can be in contact and I will be inviting to the org. Our purpose is to quant models for the community

ZKong

12 days ago

我试了新版Q3，好多了，感谢

Phil2Sat

10 days ago

Btw: @Phil2Sat if you want to collaborate in QuantStack send me your discord account so we can be in contact and I will be inviting to the org. Our purpose is to quant models for the community

@YarvixPA sorry for beeing late, i didn't ignore the message but my brain was unable to get an answer, so discord i guess last used half an year ago is "phil2sat".
im no developer and just trying to keep my brain fresh, if i want that a model works an no one has a working model i have to make i by myself.
the low quant fix was a gut feeling while i'm trying to understand the code. what if and works...

my hardware is so low spec that i needed some time to requant a model i use, actually i'm trying to rebuild the model in fp32 for better quant quality since the maintainer has only a fp8 version.
it takes an hour to only load the model. with an PC from 2012 and an GPU from 2016 normally Qwen is a little bit out of spec but it runs.

So for your question, i would...