You can make people misinterpret homophones

hedgehog@ttrpg.network · 4 小时前

Is your goal to create things that can be published or used in a project, or to create audiobooks for yourself to listen to?

For voiceovers for text, I use Kokoro Fast API, which has a web frontend. The frontend is only compatible with Chromium browsers on desktop or Android, which sucks as my daily driver is Firefox and an iPhone (there are workarounds in the thread) but it supports voice mixing, speed changes, etc… It also has an issue where it keeps the models (about 3GB) in memory; I keep the CPU version loaded normally and swap to the GPU version if I need it to be faster. If you want something similar for Bark, check out Bark-GUI.

I’ve also dabbled a bit in some TTS features that have Comfy nodes, though at this point mostly just in terms of getting them set up. For my purposes thus far Kokoro has been fine (and I prefer the FastAPI project over the Comfy nodes for most of my uses), but I’ve found nodes for Kokoro, Dia, F5 TTS, Orpheus, and Zonos.

Autiobooks and audiblez both look promising. A few weeks ago, I used the Kokoro FastAPI web frontend to create an audiobook for an ebook I worked on that used entirely self-hosted AI generation for the outlining and prose. Audiblez, which I found about like two days after that, looks like it would have simplified that process substantially. Still, I’d personally like something more like an audiobook studio, where I can more easily swap voices back and forth, add emotions, play with speed on a more granular level, etc… I’m thinking about building something that contains that at some point myself, but it’ll be a minute - hopefully someone else will beat me there.

I posted a comment here a few weeks back on a similar topic. I’ve since used OpenReader-WebUI and like it, though that’s not for producing audiobooks, but for a read-along experience. Reproducing the comment below in case it’s helpful for you:

If you want to generate audiobooks using your own / a hosted TTS server, check out one of these options:

OpenReader-WebUI - this has built-in read along capability and can be deployed as a PWA that can allow you to download the audiobooks to your phone so you can use them offline
p0n1/epub to audiobook
ebook2audiobook If you don’t have a decent GPU, Kokoro is a great option as it’s fast enough to run on CPU and still sounds very good. If you’re going to use Kokoro, Audiblez (posted by another commenter) looks like it makes that more of an all-in-one option. If you want something that you can use without an upfront building of the audiobook, of the above options, only OpenReader-WebUI supports that. RealtimeTTS is a library that handles that, but I don’t know if there are already any apps out there that integrate it. If you have the audiobook generation handled and just want to be able to follow along with text / switch between text and audio, check out https://storyteller-platform.gitlab.io/storyteller/

hedgehog@ttrpg.network · 4 小时前

The witch turned the creep into a woman and the spell was complete by the time she flew away. Unfortunately, like many women, the creep was born with the body of a man (she’s AMAB). Maybe the witch could have changed her body, too, but that would have made things far too easy, given that the point of the curse was to teach her empathy.

hedgehog@ttrpg.network · 18 小时前

SublimeText seems to have it. I don’t personally use it but it’s a pretty competent editor and it’s not in the feature table from the Wikipedia page someone else shared.

Sublime 3 was limited to folding by indentation; I’m not sure if that’s true for Sublime 4 as well, but the Markdown plugin docs have a note on folding and mention you can fold by section and heading levels.

hedgehog@ttrpg.network · 3 天前

Your comment wasn’t in a meta discussion; it was on a post where they were venting about people complaining about them having a women’s only space. There was certainly no indication that the regular community rules didn’t apply, nor any invitation for men to comment.

Commenting that it’s hostile for them to have a women’s only space might be ironic, but couldn’t possibly be good faith, in that context. And if the same mod banned you from multiple communities, then either it was out of line and you could appeal it, or it was warranted due to the perceived likelihood of you causing problems in those other communities and the perceived low likelihood of you contributing anything of value to them.

Even now, you’re acting like the mod(s) banned you because of her / their emotions. You don’t see how that’s misogynistic?

It makes logical sense for bad actors to be preemptively banned. Emotions have nothing to do with it.

hedgehog@ttrpg.network · edit-2 5 天前

Right now I have Ollama / Open-WebUI, Kokoro FastAPI, ComfyUI, Wan2GP, and FramePack Studio set up. I recently (as in yesterday) configured an API key middleware with Traefik and placed it in front of Ollama and Comfy, but currently nothing is using them yet.

I’ll probably try out Devstral with one of the agentic coding frameworks, like Void or Anon Kode. I may also try out one of the FOSS writing studios (like Plot Bunni) and connect my own Ollama instance. I could use NovelCrafter but paying a subscription fee to use my own server for the compute intensive part feels silly to me.

I tried to use Open Notebook (basically a replacement for NotebookLM) with Ollama and Kokoro, with Kokoro FastAPI as my OpenAI endpoint, but turns out it only supported, and required, text embeddings from OpenAI, so I couldn’t do that fully on my local. At some point, if they don’t fix that, I’m planning to either add support myself or set up some routes with Traefik where the ones OpenNotebook uses point to the service I want to use.

ETA: n8n is one of the services I plan to set up next, and I’ll likely end up integrating both Ollama and Comfy workflows into it.

hedgehog@ttrpg.network · 6 天前

You got the idea!

hedgehog@ttrpg.network · edit-2 6 天前

We’re in c/showerthoughts. “What if my grandma was a bike?” would fit right in

hedgehog@ttrpg.network · 9 天前

To be clear, I agree that the line you quoted is almost assuredly incorrect. If they changed it to “thousands of deepfake apps powered by open source technology” then I’d still be dubious, simply because it seems weird that there would be thousands of unique apps that all do the same thing, but that would at least be plausible. Most likely they misread something like https://techxplore.com/news/2025-05-downloadable-deepfake-image-generators.html and thought “model variant” (which in this context, explicitly generally means LoRA) and just jumped too hard on the “everything is an open source app” bandwagon.

I did some research - browsing https://github.com/topics/deepfakes (which has 153 total repos listed, many of which are focused on deepfake detection), searching DDG, clicking through to related apps from Github repos, etc…

In terms of actual open source deepfake apps, let’s assume that “app” means, at minimum, a piece of software you can run locally, assuming you have access to arbitrary consumer-targeted hardware - generally at least an Nvidia desktop GPU - and including it regardless of whether you have to write custom code to use it (so long as the code is included), use the CLI, hit an API, use a GUI app, a web browser, or a phone app. Considering only apps that have as a primary use case, the capability to create deepfakes by face swapping videos, there are nonetheless several:

Roop
Roop Unleashed
Rope
Rope Live
VisoMaster
DeepFaceLab
DeepFaceLive
Reactor UI
inswapper
REFace
Refacer
Faceswap
deepfakes_faceswap
SimSwap

If you included forks of all those repos, then you’d definitely get into the thousands.

If you count video generation applications that can imitate people using, at minimum, Img2Img and 1 Lora OR 2 Loras, then these would be included as well:

Wan2GP
HunyuanVideoGP
FramePack Studio
FramePack eichi

And if you count the tools that integrate those, then these probably all count:

ComfyUI
Invoke AI
SwarmUI
SDNext
Automatic1111 SD WebUI
Fooocus
SD WebUI Forge
MetaStable
EasyDiffusion
StabilityMatrix
MochiDiffusion

If the potential criminals use easier ready-made (commercial) web-services instead of buying a RTX 5090, learning ComfyUI, dealing with the steep learning curve etc, we’d know we have to primarily fight those apps and services, not necessarily the generative AI tools.

This is the part where, to be able to answer that, someone would need to go and actually test out the deepfake apps and compare their outputs. I know that they get used for deepfakes because I’ve seen the outputs, but as far as I know, every single major platform - e.g., Kling, Veo, Runway, Sora - has safeguards in place to prevent nudity and sexual content. I’d be very surprised if they were being used en masse for this.

In terms of the SaaS apps used by people seeking to create nonconsensual, sexually explicit deepfakes… my guess is those are actually not really part of the figure that’s being referenced in this article. It really seems like they’re talking about doing video gen with LoRAs rather than doing face swaps.

hedgehog@ttrpg.network · 9 天前

Without searching for them myself to confirm, it’s plausible, especially if you take it to mean “apps leveraging open source AI technology.”

There are a ton of open source AI repos, many of which provide video related capabilities. The number of true open source AI models is very slim, but “Open weight” AI models are commonly referred to as open source, and from the perspective of building your app, fine tuning the model, or creating Loras for it, open weight is good enough.

Some Loras come with details on the training data set, so even if the base model is only open weights, the Lora can still be open source.

Until recently, Civitai had Loras for famous people, e.g., Emma Watson, and apparently just regular people. There was a post here last week, I think (or maybe to some other community), to 404 Media, about those being taken down thanks to credit card processors drawing a line in the sand at deepfake imagery.

ComfyUI is a self hostable AI platform (and there are also many hosts that offer it) that lets you build a workflow from multiple nodes, each of which generally integrates some open source AI tech that was otherwise released. For example, there are nodes that add the capabilities to perform:

image generation with Stable Diffusion, Flux, Hidream, etc
TTS with KokoroTTS, Piper, F5 TTS, etc
video generation with AnimateDiff, Cog, Wan2.1, Hunyuan, FramePack, FantasyTalking, Float
video modification, i.e., LatentSync, which takes a video and lipsyncs it to a provided audio file
image manipulation, i.e., controlnet, img2img, inpainting, outpainting, or even specific tasks like “remove the background” or “change the face to this other face”

If you think of a deepfake as just a video of a recognizable person doing a thing, you can create a deepfake by:

taking an existing video and swapping the face in each frame
faceswap video specific approaches, i.e., Roop.
an image to video workflow, i.e., with Wan: “the person dances.” You can expand the options available with Wan by using Loras.
a text to video workflow, where you use a Lora for that person
an image+audio to video workflow, i.e., with FantasyTalking/Float, creating a lipsync to an audio file you provide
a video+audio to video workflow with LatentSync to make it look like they said something different, particularly using a TTS (like F5 TTS) that does voice cloning to generate the new audio

My suspicion is that most of the AI apps that are available online are just repackaging these open source technologies, but are not open source themselves. There are certainly some, of course, though the ones I know of are more generic and not deepfake specific (ComfyUI, SwarmUI, Invoke AI, Automatic1111, Forge, Fooocus, n8n, FramePack Studio, FramePack Eichi, Wan2GP, etc.).

This isn’t a licensing issue, as many open source projects are licensed with MIT or Apache licenses, which don’t require you to open source derivative products. Even if they used the GPL, it wouldn’t be required for a SaaS web app. Only the AGPL would protect against that, and even then, only the changes to the AGPL library would need to be shared; the front end app could still be proprietary.

The other issue could be them not knowing what “app” means. If you think of a Lora as an app, then the sentence might be accurate. I don’t know for sure that there were thousands of Loras for people that published their training data, but I wouldn’t be surprised if that were the case.

hedgehog@ttrpg.network · 12 天前

Have you tried just setting the resolution to 1920x1080 or are you literally trying to run AAA games at 4K on a card that was targeting 1080p when it was released, 4 and a half years ago?

hedgehog@ttrpg.network · 17 天前

It’s the new hyped up version of “no-code” or low-code solutions, but with AI so you have more flexibility to footgun.

hedgehog@ttrpg.network · 17 天前

Not any lazier. Script kiddies didn’t write the code themselves, either.

hedgehog@ttrpg.network · 18 天前

Are you talking about a warning for a self signed cert or for not using HTTPS?

hedgehog@ttrpg.network · 18 天前

It was already known before the whistleblower that:

Siri inputs (all STT at that time, really) were processed off device
Siri had false activations

The “sinister” thing that we learned was that Apple was reviewing those activations to see if they were false, with the stated intent (as confirmed by the whistleblower) of using them to reduce false activations.

There are also black box methods to verify that data isn’t being sent and that particular hardware (like the microphone) isn’t being used, and there are people who look for vulnerabilities as a hobby. If the microphones on the most/second most popular phone brand (iPhone, Samsung) were secretly recording all the time, evidence of that would be easy to find and would be a huge scoop - why haven’t we heard about it yet?

Snowden and Wikileaks dumped a huge amount of info about governments spying, but nothing in there involved always on microphones in our cell phones.

To be fair, an individual phone is a single compromise away from actually listening to you, so it still makes sense to avoid having sensitive conversations within earshot of a wirelessly connected microphone. But generally that’s not the concern most people should have.

Advertising tracking is much more sinister and complicated and harder to wrap your head around than “my phone is listening to me” and as a result makes for a much less glamorous story, but there are dozens, if not hundreds or thousands, of stories out there about how invasive advertising companies’ methods are, about how they know too much, etc… Think about what LLMs do with text. The level of prediction that they can do. That’s what ML algorithms can do with your behavior.

If you’re misattributing what advertisers know about you to the phone listening and reporting back, then you’re not paying attention to what they’re actually doing.

So yes - be vigilant. Just be vigilant about the right thing.

hedgehog@ttrpg.network · 19 天前

proven by a whistleblower from apple

Assuming you have an iPhone. And even then, the whistleblower you’re referencing was part of a team who reviewed utterances by users with the “Hey Siri” wake word feature enabled. If you had Siri disabled entirely or had the wake word feature disabled, you weren’t impacted at all.

This may have been limited to impacting only users who also had some option like “Improve Siri and Dictation” enabled, but it’s not clear. Today, the Privacy Policy explicitly says that Apple can have employees review your interactions with Siri and Dictation (my understanding is the reason for the settlement is that they were not explicit that human review was occurring). I strongly recommend disabling that setting, particularly if you have a wake word enabled.

If you have wake words enabled on your phone or device, your phone has to listen to be able to react to them. At that point, of course the phone is listening. Whether it’s sending the info back somewhere is a different story, and there isn’t any evidence that I’m aware of that any major phone company does this.

hedgehog@ttrpg.network · 19 天前

Sure - Wikipedia says it better than I could hope to:

As English-linguist Larry Andrews describes it, descriptive grammar is the linguistic approach which studies what a language is like, as opposed to prescriptive, which declares what a language should be like.[11]: 25 In other words, descriptive grammarians focus analysis on how all kinds of people in all sorts of environments, usually in more casual, everyday settings, communicate, whereas prescriptive grammarians focus on the grammatical rules and structures predetermined by linguistic registers and figures of power. An example that Andrews uses in his book is fewer than vs less than.[11]: 26 A descriptive grammarian would state that both statements are equally valid, as long as the meaning behind the statement can be understood. A prescriptive grammarian would analyze the rules and conventions behind both statements to determine which statement is correct or otherwise preferable. Andrews also believes that, although most linguists would be descriptive grammarians, most public school teachers tend to be prescriptive.[11]: 26

hedgehog@ttrpg.network · 19 天前

You might be interested in reading up on the debate of “Prescriptive vs Descriptive” approaches in a linguistics context.

hedgehog@ttrpg.network · 19 天前

I had never heard of this show before today but what you just described makes it sound cool as fuck, I’m gonna check it out now

hedgehog@ttrpg.network · 20 天前

You should try watching the live action series next - I bet you’d love it.

hedgehog@ttrpg.network · 21 天前

The one I grabbed to test was the ROG Azoth.

I also checked my Iris and Moonlander - both cap out at 6, but I believe I can update that to be higher with QMK or add a config key via Oryx on the Moonlander to turn it on.

hedgehog@ttrpg.network · 2 个月前

You can make people misinterpret homophones

hedgehog@ttrpg.network · 9 个月前

Meta trained its AI on almost all public posts since 2007

hedgehog@ttrpg.network · 1 年前

Video - Palworld Modded with Pokemon