Thank you for the input! I recently upgraded my PC to be able to handle Stable Diffusion, and I got 12GB of VRAM to work with at the moment. I also have recently started to self-host some applications on a VPS, so some basics are there.
As for what I’d like to do with Stable Diffusion: One of my hobbies is storytelling and worldbuilding. I would like to (one day) be able to work on a story with a LLM and then prompt it: “now give me a drawing of the character we just introduced to the story” and the LLM would automagically rope in Stable Diffusion and produce a workable drawing with it. I think that this is probably beyond the capability of the current tools, but this is what I would like to achieve. I will definitely look into langchain to see what I can do with it.
That’s also where the questions about context length and cross thread referencing come from. I did some work with ChatGPT and am amazed at how good a tool it is to “brainstorm with myself” in developing stories. However, it does not remember the story bits I’ve been working on 2 hours ago, which kinda bummed me out … :)
I look them up at lemmyverse.net
I go there about once a week to see if there are new communities I might be interested in. I’m on a selfhosted single-user instance, so my “all” is identical to my “subscribed” and this is how I populate my feed.
Yeah, reducing CFG can help a lot. It sometimes feels to me, that getting a good image is knowing at what point to let loose …
When I started I was just copying from online galleries like Civitai or Leonardo.ai, which gave me noticeable better images than what I have came up with myself before. However, it seemed to me that many of these images may also just have copied prompts without understanding what’s really going on with them and I started to experiment for myself.
What I will do right now is to build my images “from ground up” starting with super basic prompts like “a house on a lake” and work from there. First adding descriptions to get the image composition right, then work in the style I’m looking for (photography, digital artwork, cartoon, 3D render, …). Then I will work in enhancers and see what they change. I found that one has to be patient, only change one thing at a time and always do a couple of images (at least a batch of 8) to see if and what the changes are.
So, I still comb though image galleries for inspiration in prompting, but I will now most of the time just pick one keyword or enhancer and see what it does to my own images.
It is a long process that requires many iterations, but I find it really enjoyable.
I just figured out that I could drag any of my images, made with A1111, into the UI and it would set up the corresponding workflow automatically. I was under the impression that this would only work for images already created with ComfyUI first. However, this gives great starting points to work with. I will play around with it tonight and see if I can extract upscaling and control-net workflows with it as a starting point from existing images.
Do happen to have a tutorial for ComfyUI at hand, that you can link and that goes into some details? These custom workflows sound intriguing, but I’m not really sure where to start.
please do, I thinking to start making LoRa’s as well and the tool looks like it would make the process much easier. Let me know how it goes for you.
I started with the smallest offer available and later upgraded to the second smallest, which now has 4GB RAM. I also have rented additional diskspace, so that I have 30GB now. RAM and CPU are now certainly fine, but I don’t know yet about disk space. I read that Lemmy/Mastodon can eat up space quickly and I have currently used up about half of my disk space.
You should be able to configure this differently. Either switch of the confirmation mails completely or use the email credentials from another server.
I use Synapse as Matrix server and Element as client. It doesn’t need port 25 (8008 and 8448 are needed in my setup). On Lemmy and Mastodon I configured outgoing mail using smtp via my existing mail hoster, so I don’t send mail from my own server. Also, all googling I did said to stay away from selfhosting email, as it is a hassle not to be immediately blocked as a spam mail server …
I use Synapse as the Matrix server and Element as client on desktop and mobile. It does support video calls, but so far I only tested it for a minute.
I spent a lot of time googling and on youtube, to get a basic understanding for what I was trying to achieve, 2 weeks of after-work time at least. If I should guess 40-50 hours in total. Getting a single piece to work, by following a tutorial can be easy but to get all the things working together was a struggle. Once I had a better grasp on what a reverse proxy is and how docker containers work together in networks, pieces started to fall into place.
I have fail2ban running as well, didn‘t mention it in the op. Also closed all ports beside 80 and 443, which are routed through my NPM proxy. SSH is allowed, but login only with ssh key, no pw authentication.
So far it‘s running well, but I expect things to break when I‘ll need to update parts of it. I have a snapshot from which i can reinstall, but recurring backups need yet to be set up.
I run Nginx with Nginx Proxy Manager web-ui, which makes setting up proxy hosts and handling letsencrypt certificates really easy. I also use Portainer to manage my docker containers. This works well for the stuff I mentioned above (Nextcloud, Matrix, Lemmy mostly)
If I can get Mastodon into the same setup, it’d be neat. I just found a lot of discussion with problems, so I thought I’ll ask about it before I spend a few hours in vain :)
The prompt was just an example and usually my prompts get quite a bit longer than that. But in 1.5 models I manage to get what I want to see eventually. I also find that throwing in qualifiers like “mesmerizing” does do something to the image, although in can be subtle.
However, what I wanted to say here was that in SDXL my prompting seems to go to nowhere and I feel I’m not able to get out the kind of image I have in my head. Keeping the prompt example, in SD1.5 using a custom model like Deliberate 2.0 I’m able to end up with an image of a hat wearing cat surround by surreal looking candy pops. (however the final prompt for this reads). In SDXL my images “break” (i.e start looking flat, unrefined or even bizarre) at some point long before I can direct them towards my imagined result. All my usual approaches like reducing CFG, re-ordering prompts, using a variety of qualifierts don’t seem to work like I’m used to.
And tbh, I think this has to be expected. These are new models, so we need new toools (prompts) to work with them. I just haven’t learned how to do it yet and I’m asking how others do it :)