CLIP-LIT

Iterative prompt learning

Our Results

Our method

We devise a prompt learning framework that first learns an initial prompt pair by constraining the text-image similarity between the prompt and the corresponding image in the CLIP latent space. Then, we train the enhancement network based on the text-image similarity between the enhanced result and the initial prompt pair.
To further improve the accuracy of the initial prompt pair, we iteratively fine-tune the prompt learning framework to reduce the distribution gaps between the backlit images, enhanced results, and well-lit images via rank learning, boosting the enhancement performance. Our method alternates between updating the prompt learning framework and enhancement network until visually pleasing results are achieved. Extensive experiments demonstrate that our method outperforms state-of-the-art methods in terms of visual quality and generalization ability, without requiring any paired data.

Iterative Performance

During iterations, the learned negative prompt becomes increasingly relevant to the regions with unpleasant lighting and color.

After enough iterations, the over-saturation is corrected while the dark regions are closer to the well-lit state compared with the previous outputs.

Video demo