Fundamentals of AI Art Generation

CBR JGWRR · May 12, 2025

MacGowan said:
right? I was happy with how it made it all one style. don't think I could recreate that for say 20-30 images though, but it's a start.

Have you tried putting the image back into chatgpt? say you find the character your most happy with, upload that image into chatgpt and write "give this character gloves". then you take that image and upload it into chatgpt again to work on the background? could work?

Did try, but the results were still hit and miss... While ChatGPT was much better at getting the feel of the painting, it still had a habit of missing elements.

Overall, I do keep in mind that what I'm working on with the Xenayan space program is a very niche project artistically, but it is difficult not to feel a little bit disappointed.

dmurgell · May 26, 2025

Hello everyone
I arrived to this thread thanks to @Chac1 who always has great advice. Thanks @Chac1 !

For those who are not yet familiar with my CK2 AAR, I will sum it saying it is a comedic chronicle of Scotland, ruled by the Blackadders. You can see the link in my signature. From the very beginning, I decided to use only AI generated images in the AAR, not game screenshots. This is giving me quite a lot of freedom on the story, as I can add absurd details which do not come on the game. And I was not aware of this thread, so I have been just learning and trying 'solo' ... I am using both Chat GPT and Copilot for image generation. I was not initially aware that they both rely on DALL·E 3.

What I found to date, and sorry I was not really 'documenting' the process as I only used it as a 'personal work' is :

- Add as much context as you can before starting "the prompt" itself. It dramatically improves results. Really.
Example : I am writing an After Action Report based on Crusader Kings 2 game, where the main character is Lord Edmund Blackadder. It is a comedic AAR mixing recurring jokes and character from the TV series Blackadder with CK2 game events. The current chapter is set in year 769 in Edinburgh.

- Add the overall idea / scenario where your image is to be used.
Example : I need an image to illustrate a passage where Lord Edmund is on pilgrimage to Ireland, where he would pay a visit to St. Patricks tomb. During the journey, he is assaulted by thieves but somehow manages to fight them back.

- Finally "the prompt" specifying what you want to be pictured, on which style, the size or proportions expected, and all details that you need
Example : With the previous context, generate an image of Edmund Blackadder during his pilgrimage. He is wearing noble robes and a crown. As it is on Ireland, picture a leprechaun and some four-leaf clovers. I want it to look like an illustrated manuscript similar to the Book of Kells. I want the image to be on landscape, that is wider than tall.

I just made up this example, and this is the result :

ChatGPT Image 26 de maig del 2025, 19_05_28.png

Nice, but it does not seem the Book of Kells - I assume it is because on my last session I was asking for comic style, and maybe Chat GPT has still that memory (although I started a new conversation). So I added a new prompt : Find images of the Book of Kells and use them as style guide for the image. Generate it again.

This is the answer with the "process" Chat GPT has done :

And the image after this process :

ChatGPT Image 26 de maig del 2025, 19_14_28.png

Which looks funny ... but not what I wanted.

So, what I will do now is close browser, clean cache, delete history, ask Chat GPT to erase memory as well and try again ...

Stay tuned for next post.

dmurgell · May 26, 2025

Okey so this is the result after restarting browser and asking Chat GPT to erase the memory of last conversation. Same prompt used on a new conversation.

Generate an image of Edmund Blackadder during his pilgrimage. He is wearing noble robes and a crown. As it is on Ireland, picture a leprechaun and some four-leaf clovers. I want it to look like an illustrated manuscript similar to the Book of Kells. I want the image to be on landscape, that is wider than tall.

Chat GPT Image 26 de maig del 2025, 20_09_25.png

Now looks closer to my idea, getting finally rid of the "comic" style that was previously used. But sadly the face ressemblance has lost quite the detail.

Using same prompt but changing the style definition, just for the fun of testing. I keep on the same conversation so context remains active.

Generate an image of Edmund Blackadder during his pilgrimage to Ireland. He is wearing noble robes and a crown. As it is on Ireland, picture a leprechaun and some four-leaf clovers. I want it to look like a Caravaggio baroque painting . I want the image to be on landscape, that is wider than tall.

Chat GPT Image 26 de maig del 2025, 20_15_43.png

Seems hands are no longer an issue for AI, which is great. However, face does not ressemble Blackadder at all ...

Lets do one more try, on differnt style (just to see the results)

Generate an image of Edmund Blackadder during his pilgrimage to Ireland. He is wearing noble robes and a crown. As it is on Ireland, picture a leprechaun and some four-leaf clovers. I want it to look like a professional photography. I want the image to be on landscape, that is wider than tall.

I do like testing different styles for the same prompt, then selecting one and using it for an entire chapter in the AAR. In the next chapter, I choose a different style. It might cause some 'cohesion issues' between chapters, but I think it's also interesting and fun — and a real chronicle spanning over 500 years would naturally feature different styles.

MacGowan · May 27, 2025

Chac1 said:
Thanks @CBR JGWRR & @MacGowan for keeping this thread interesting with further discussions of image making and the image making process.

Thank you for keeping it alive! I'm hoping to see some new AI discoveries. I know reddit got lots of stuff, but this thread is more in the vein of what i use AI for.

Chac1 said:
I do hope @MacGowan that when you launch your AAR with these images that you will let us know in this thread so we can see how you are using them.

I've published the worldbuilding on some social media platforms:

Not doing much traffic. I also made a youtube channel, but I'm not really feeling the videos. I worked with suno for music, elevenlabs for the narration, but it's a bit cumbersome, and meh. I'm hoping to launch the AAR next year, but I def need to figure out the gaming mechanics, might nick some axis and allies and darkest hour mechanics, also thinking about seeing if i can't use chatgpt or Claude to play against, (by feeding them screenshots), would love to know if anyone here has played with LLM AI as a CPU gaming opponent.

Chac1 said:
So far, we have no confirmed reports of a platform that will guarantee cohesive character creation across scenes and different poses, at least not in a free version. (One can dream, no?)

However, I do find this workaround is one way to try it: use your character in a reference image that may help with the next generation. Thanks @MacGowan for reminding us of that. When in a corner, try reusing older generations to spark new creation.

Yeah, chatGPT/Sora can be a bit annoying, but it's the only one I've found that let's me reuse characters like this:

CBR JGWRR said:
Overall, I do keep in mind that what I'm working on with the Xenayan space program is a very niche project artistically, but it is difficult not to feel a little bit disappointed.

Yeah, it seem the more specific we want the image to be, the harder it is to generate correctly.

dmurgell said:
Hello everyone

hello!

dmurgell said:
For those who are not yet familiar with my CK2 AAR, I will sum it saying it is a comedic chronicle of Scotland, ruled by the Blackadders.

I love me some Blackadder! I'll def check out your AAR.

dmurgell said:
my last session I was asking for comic style, and maybe Chat GPT has still that memory (although I started a new conversation).

I had similar problems, where ChatGPT kept sneaking in traditional chinese soldiers into everything i made. I found it had saved a memory of it in its settings, so i went there and deleted it, and turned off any memory setting.

dmurgell said:
So, what I will do now is close browser, clean cache, delete history, ask Chat GPT to erase memory as well and try again ...

yeah, i found it's better to wipe everything everytime with starting a new window.
you can also check out http://sora.chatgpt.com. It's the image generator openAI / ChatGPT is using, but its the designated site for it. That one seems to not have this memory issue, and it gives you 4 variations when you generate.

dmurgell · May 27, 2025

MacGowan said:
and it gives you 4 variations when you generate

Double the fun, half the price!

Thanks @MacGowan for all the tips. Definitely worth to check that 'memories setup' I was not aware of. It had a mix of previous conversations and ended up messing the results. Now without memory I guess each conversation will be independent and not affected by previous commands.

CBR JGWRR · May 27, 2025

An LLM as a gaming opponent is an interesting concept. I wonder if it would do better or worse than a conventional video-game AI...

Those worldbuilding pictures are really impressive for consistency of style.

MacGowan · Jun 3, 2025

dmurgell said:
Thanks @MacGowan for all the tips. Definitely worth to check that 'memories setup' I was not aware of. It had a mix of previous conversations and ended up messing the results.

I'm glad it helped!

CBR JGWRR said:
An LLM as a gaming opponent is an interesting concept. I wonder if it would do better or worse than a conventional video-game AI...

I think it will be much more unpredictable and buggy. But might work if i keep a close watch. I know they got Claude to beat Pokemon Blue with just using screenshots.

CBR JGWRR said:
Those worldbuilding pictures are really impressive for consistency of style.

thank you! It's def hard to get the style consistent across the board.

Anyone tried any video ais yet? i hear google veo3 is insane.

MichOrion · Jun 4, 2025

Over the past few months, I’ve been developing a visual archive for my tabletop RPG project, Hollow Crown: No Lords but the Living, inspired by the After the End mods for Crusader Kings, a post-collapse, neo-medieval setting rooted in ritual, memory, and survival along a reimagined Mississippi and Appalachian interior. The goal has been simple but demanding: generate images that look like in-world artifacts, not fantasy concept art.

Consistency has been the hardest part — and the most satisfying win. Every image is generated using a strict workflow grounded in:

Extensive Project Files and editorial instructions, covering lore, tone, formatting, and mechanical integration
A closed corpus — no speculation or extrapolation beyond what’s written
A visual rulebook: real-world materials only (cloth, bone, iron, wood), natural light only (candle, torch, overcast), and no fantasy drift
Captions are structured as diegetic entries: reverent, in-universe, and formatted like those found in an illustrated psalter or faction ledger
Any visual misstep — even a bow held wrong or a feather too bright — gets corrected and re-rendered using direct textual reference

Here are a few of the results so far:

Butternutter Hearthtender – matron of a communal oven-shrine

Conclavian Field Confessor – priest taking indulgence from shackled rebels, offering absolution in exchange for certain death in the front line of the President's forces

Ani-Nantahi Spirit-Warrior – forestbound rite-fighter armored in bone and oath

Ophiolatrist Shrine-Matron – binding rites lit by serpent-candles in a cave shrine

Riverlander Duelists – fiddle duel scene shown as graffiti in a textbook from centuries later

Saint of Arch and Mass (Stained glass).png

Saint of Arch and Mass – stained glass depiction of a Builder-Saint holding a civic structure

Deltaic Cavalier – a second-born noble in ceremonial duel attire, quilted with family thread

If you’re building your own in-world visual stylebook, especially for game settings, historical fiction, or alt-future societies, I’ll be happy to share process details or the full Image Workflow.

I've got many more examples and I'll post more if there is more interest!

I'm currently developing character images from bios. Multiple figures is still difficult as well as recreating something generated in a new pose in a consistent way.

MichOrion · Jun 4, 2025

MacGowan said:
I've published the worldbuilding on some social media platforms:

Wow, I love the subject! Hard sci-fi, Expanse-inspired? Looks like some hard-fought consistency here!

CBR JGWRR · Jun 4, 2025

MacGowan said:
I'm glad it helped!

I think it will be much more unpredictable and buggy. But might work if i keep a close watch. I know they got Claude to beat Pokemon Blue with just using screenshots.

To be fair, I think Pokemon is something that would be relatively easy for a LLM, as strategic level is well within the limits of it's long term memory, and especially for Pokemon Blue - being from way before they started the XP reduction for beating lower level Pokemon - it would have the patience to grind out level 100 mons from the start of the game, and that would trivialise the rest of the game once the moveset of the mon is loaded with pure attack moves.

Or did they get it to complete the Pokedex?

I think a LLM would have a much harder task of say, playing CK3.

MacGowan said:
thank you! It's def hard to get the style consistent across the board.

Definitely...

MacGowan said:
Anyone tried any video ais yet? i hear google veo3 is insane.

I've been unhappy with the attempts I've done with LeonardoAI, but equally I've not tried it in a while so it may be improved by now.

Edit.

I've now blown through 25k Tokens on LeonardoAI with the VEO3 model (at 2500 a go...) and well.. It takes a good four minutes to generate with my system - a mid-range gaming desktop that's a few years old now.

But, these forums don't support .mp4 files...

And that means you guys can't see the short video of Naomi and Buri cuddling, with Buri opening and shutting his jaws at a pace suggestive of saying "I love you." It's sweet.

MacGowan · Jun 8, 2025

MichOrion said:
I've got many more examples and I'll post more if there is more interest!

Very interesting stuff, and it looks great! Are you gonna do an AAR on the RPG?

MichOrion said:
Wow, I love the subject! Hard sci-fi, Expanse-inspired? Looks like some hard-fought consistency here!

Thanks! The list of inspiration is long, but definitely some Leviathan Wakes in there!

CBR JGWRR said:
To be fair, I think Pokemon is something that would be relatively easy for a LLM, as strategic level is well within the limits of it's long term memory

CBR JGWRR said:
I think a LLM would have a much harder task of say, playing CK3.

For sure. Also setting up an agentic AI is far beyond my abilities. But I wonder if you even need memory. a Claude-Sonnet, or newest ChatGPT would see screenshots of CK3: map, character, resources, etc. it should know what is best by context (maybe you add a quick summary of what is happening with the screenshots).

then after it has done these moves. you just screenshot everything and feed a fresh AI it with your updated summary. wash rinse repeat.

That's super cumbersome for CK3 (unless automated). But for a turn based tabletop wargame it might work fine.

CBR JGWRR said:
I've been unhappy with the attempts I've done with LeonardoAI, but equally I've not tried it in a while so it may be improved by now.

Edit.

I've now blown through 25k Tokens on LeonardoAI with the VEO3 model (at 2500 a go...)

Oh, wow, I didn't know it was up on Leonardo. thanks for the tip!

So what's the verdict? worth it?

CBR JGWRR · Jun 8, 2025

Worth it...

It depends.

At 2500 tokens a go, it's very pricy. (with the subscription I'm on, I get 8500 tokens each month) Leonardo's Motion 2.0 uses 200 tokens a go, and well...

The cost means I've only done two comparisons. One is the Naomi and Buri wedding day scene.

Veo 3 got the really sweet video that had Buri not only passably close to right for a Xenaya (he even got sabre teeth this time!) but also in a fitting wedding suit and the aforementioned opening and shutting his mouth in a way consistent enough with saying "I love you" that I'd take it. It got Naomi wrong, granted (unusally, probably 90% of the time it's Buri that's generated wrong) as her hair was depicted as frizzy, not braided.

Motion 2.0 got the characters right by virtue of being able to use an image to video prompt, but she does this weird forehead kiss on the eyebrow and he nuzzles her chest. Sweet, but not as sweet as Veo 3's effort.

The second comparison was a character selection screen concept video depicting an orc chief eating the arm of his last victim while resting his dual machineguns on the floor. Motion 2.0 really struggled to get the orc right, often making a human-orc hybrid, but Veo 3 depicts the orc bashing his meal -laid on the floor- so hard the arm flies up into his mouth. Totally unrealistic, but also totally in character.

I'd say if you are happy with the much reduced throughput - 3 Veo 3 videos against 42 Motion 2.0 videos for the 8500 tokens - then Veo 3 has certainy got it's perks. But equally, this is a sample size of two videos...

MichOrion · Jun 10, 2025

MacGowan said:
Very interesting stuff, and it looks great! Are you gonna do an AAR on the RPG?

Yes, I am here!

dmurgell · Jun 10, 2025

I experimented quite a lot with Chat GPT Sora (within the free user limits) and pretty impressed on the results.
You can see photorealistic generated images of Blackadder adventures on my last chapter here.

MacGowan · Thursday at 00:24

CBR JGWRR said:
At 2500 tokens a go, it's very pricy. (with the subscription I'm on, I get 8500 tokens each month) Leonardo's Motion 2.0 uses 200 tokens a go, and well...

Oh, damn, that's pricey. Even with simple still images you need to make so many. AI generating is like shooting at a barn door and painting the target on afterwards.

MichOrion said:
Yes, I am here!

Very interesting stuff! and the images work great to tie it together. I like the nice dark contrasts and dirt of it. Looks medieval, but also post-apocalyptic.

dmurgell said:
I experimented quite a lot with Chat GPT Sora (within the free user limits) and pretty impressed on the results.

You can see photorealistic generated images of Blackadder adventures on my last chapter here.

Love a little Lord Flasheart cameo WOOF!.

Good read, nice visuals here too. ties the thing together for sure.

CBR JGWRR · Thursday at 08:11

MacGowan said:
Oh, damn, that's pricey. Even with simple still images you need to make so many. AI generating is like shooting at a barn door and painting the target on afterwards.

True. I remember the hundreds of images rejected trying to get Naomi and Buri last year...

But, so far, Veo3 seems to have a substantially smaller barn door. Possibly even normal door sized, although I'm yet to run enough tests to be sure.

CBR JGWRR · 2025-06-24T19:02:28+0200

CBR JGWRR said:
True. I remember the hundreds of images rejected trying to get Naomi and Buri last year...

But, so far, Veo3 seems to have a substantially smaller barn door. Possibly even normal door sized, although I'm yet to run enough tests to be sure.

So, with this in mind, I reviewed the new GPT-image-1 model for seeing a comparison.

Prompt here was:
"A simple wooden carving of a black African woman with flowing braided hair wearing a white wedding dress embracing with a majestic golden furred bipedal creature with long sabre teeth and a pair of horns rising from it's head, on their wedding day."

What I had in mind is a plain carving that someone might make with a chisel (or claw) - start from solid piece, and remove material to uncover the sculpture.

Leonardo's Pheonix 1.0 didn't get the memo - a sizeable number had Naomi and Buri gender-swapped, and not one of them had Naomi's hair braided. But, I do have to point out that - with the exception of numerous misplaced horns - all 40 images generated had a Xenayan figure for Buri/"Buri". Not even one "human with horns" failure. (apart from on Naomi, for whom several images depicted her as white or asiatic)

Not one was a wooden carving; the closest it managed was the odd generation putting a wood background on.

Getting disappointed, I moved on to GPT-image-1.

It automatically expanded the prompt to:
"A regal black woman with dark brown skin, adorned with delicate, beaded braids and subtle, shimmering makeup, wearing a flowing, lace-trimmed white wedding gown with a sweetheart neckline and a full, gathered skirt, tenderly embracing a majestic, bipedal creature with a thick, golden coat, sharp, curved sabre teeth, and imposing, spiraling horns that curve back from its forehead, its eyes warm with affection, on a sun-dappled, idyllic wedding day, with lush greenery and blooming flowers surrounding them, as if in a secluded, enchanted glade, the entire scene intricately engraved on a weathered, wooden tablet with ornate, foliage-patterned borders and subtle, warm undertones that evoke a sense of timelessness and wonder."

I left it at that, even if I was unsatisfied with the modifications - I wasn't after intricate detail, for example.

The first attempt:

GPT_Image_1_A_black_Human_woman_wearing_a_white_wedding_dress_0.png

Well... As close to a bullseye as I'd yet seen. If you're being picky, the horns don't match and Buri is au naturel, but the prompt didn't say anything about him being clothed.

The next iteration went weird - prompt got altered to "A joyful black woman with dark brown skin, adorned with intricate braids and subtle golden jewelry, wears a flowing, lace-trimmed white wedding gown with a delicate veil, as she warmly embraces a majestic, golden-furred, bipedal creature with razor-sharp sabre teeth and curved horns protruding from its regal head, its piercing yellow eyes gleaming with affection, on their whimsical, magical wedding day, set against a rich, engraved wooden tablet with elegant, cursive script." - and well, none of the images were what I wanted, but, all the images did match the prompt.

Next attempt, I switched off the dynamic prompt enhancement, and told it to generate with the default prompt stated earlier, and it gives us 8 pictures that again fit the prompt very well - all 8 managed to look engraved (although some did look very shallow) and there were two stand-out matches:

GPT_Image_1_A_simple_wooden_carving_of_a_black_African_woman_w_6.png

GPT_Image_1_A_simple_wooden_carving_of_a_black_African_woman_w_5.png

The second is essentially there - what it lacks are because the prompt missed those details. (Naomi's veil, Buri's suit, Buri's eyes being open)

I then did two more batches, but none were as good as either of these.

So, I switched to trying to inpaint the features desired.

Sadly, this is where it took a few tries.

First try - close Buri's eyes as they nuzzle.

I'm not a fan of the softening of Buri it's done alongside that though.

The first efforts to put Buri in a suit put them both in blue suits, and the fourth, which didn't, managed to get Buri wrong:

Default_Add_a_wedding_veil_for_her_and_a_dark_blue_suit_for_hi_0.jpg

That... Isn't a Xenaya.

The next attempt switched from FLUX-1 Kontext to GPT-image-1 for the inpainting, and gave this:

GPT_Image_1_Add_a_wedding_veil_for_her_and_a_dark_blue_suit_fo_0.png

Ok, that veil definitely is not carved. But, it does look like it's a piece of fabric that has been added and tucked in. As does the suit...

The next attempt instructed it to "retexture with a darker wood throughout", and gave this:

GPT_Image_1_Refinish_with_a_darker_wood_texture_throughout_0.png

Which is a little too dark, and Buri's right sabre has become a beard.

Trying again using "Refinish with a slightly darker wood texture." as the inpainting prompt gave:

GPT_Image_1_Refinish_with_a_slightly_darker_wood_texture_0.png

Well, sabre is wrong again. But it's definitely the closest yet I've seen now. Perhaps the target is now window hatch sized.

Search

Fundamentals of AI Art Generation

Recent threadmarks

CBR JGWRR

Excessive Use Of Fissiles Advocate

dmurgell

Chronicler of the House Blackadder

dmurgell

Chronicler of the House Blackadder

MacGowan

Major

dmurgell

Chronicler of the House Blackadder

CBR JGWRR

Excessive Use Of Fissiles Advocate

MacGowan

Major

MichOrion

No dancing in the turret.

MichOrion

No dancing in the turret.

CBR JGWRR

Excessive Use Of Fissiles Advocate

MacGowan

Major

CBR JGWRR

Excessive Use Of Fissiles Advocate

MichOrion

No dancing in the turret.

dmurgell

Chronicler of the House Blackadder

MacGowan

Major

CBR JGWRR

Excessive Use Of Fissiles Advocate

CBR JGWRR

Excessive Use Of Fissiles Advocate

Recent threadmarks