/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality!

We are back again (again).

Our TOR hidden service has been restored.

Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB

More

(used to delete files and postings)


“The miracle, or the power, that elevates the few is to be found in their industry, application, and perseverance under the prompting of a brave, determined spirit.” -t. Mark Twain


Open file (81.05 KB 764x752 AI Thread v2.jpeg)
LLM & Chatbot General v2 GreerTech 05/30/2025 (Fri) 14:43:38 No.38824
Textual/Multimodal LLM & Chatbot General v2 Models: huggingface.co/ >How to bulk-download AI models from huggingface.co? ( >>25962, >>25986 ) Speech >>199 Vision >>97 Previous thread >>250 Looking for something "politically-incorrect" in a smol, offline model? Well Anon has just the thing for you >>38721 Lemon Cookie Project >>37980 Non-LLM chatbots >>35589
>>38824 Thanks, OP! :^)
Breaking in the new thread with a new manual update -Updated list of models -Added Odysee backups -Added @Barf's prompt ------- Odysee Backups Social-Roleplay-Llama-Gemma PC AI Models https://ody.sh/M8f3VALm7S Social-Roleplay-Llama Smartphone AI Models https://ody.sh/imSLyPOkBh Qwen Coder AI Models https://ody.sh/PzErEQF9S9
An smol'r alternative to llamacpp. Includes a vision-encoder. Appears to be Chinese. https://github.com/li-plus/chatglm.cpp >>38831 Neat! GG, Anon. Cheers. :^)
Edited last time by Chobitsu on 05/30/2025 (Fri) 21:02:49.
> (LLM dev -related : >>38845 )
>mfw I get blindsided epic style >>38853
>>38864 Looks good, but the prompt I posted with emotional tags is only for Orpheus and is just a stripped down version of the one Orpheus ships with which is this https://github.com/Lex-au/Orpheus-FastAPI/blob/main/System_Prompt.md I had a different prompt for F5-TTS with emotions but this is much better. Those are both good TTS engines that need a decent GPU with at least 8GB. For smaller, faster TTS in C++, sherpa onyx is a newer option than piper TTS https://github.com/k2-fsa/sherpa-onnx That sucks for Backyard since it was easy to install. SillyTavern\Open WebUI are a little harder but no coding is needed with Pinokio and they are all open source. SillyTavern is better for long format ERP and Open WebUI is more for productivity but has OpenAI speech API option which you can connect to a local Orpheus TTS server since it is a drop in replacement for the OpenAI speech endpoints.
>>38861 Fixed version Also, I added more backups
>>38888 Fixed version (again) -Fixed copyright notice error (Hopefully this is the last one lol)
> (LLM setup -related : >>39387 )
anyone wanna test my new ai gf implemention? https://github.com/flamingrickpat/private-machine/tree/main you'll need to be able to run llamacpppython with cuda and enough vram for gemma-3-12b-it-q4_0.gguf, weaker models will most likely not work i rewrote the whole logic (pm_lida.py) since the last time i posted it here. its inspired by the lida cognitive architecture and simulates emotions, needs, goals, intentions, self image. still in active development, nowhere near done. and very slow
>>39454 Hello Anon, welcome (back)! Please look around the board while you're here. >its inspired by the lida cognitive architecture and simulates emotions, needs, goals, intentions, self image. That sounds remarkable! Certainly if you pull this off well, it will be a major breakthrough for AI companions, I think. I hope that someday you can work towards targeting much-smol'r hardware to run your waifu software on top of, Anon. That would be very helpful for us here. Please keep us here all abreast of your progress, Anon. Good luck with this project! Cheers. :^)
>>39466 Thanks :3 My long term goals are to finetune a smol model with the data generated by the agents. Recently I did some tests with loras, adapters and memory-layer adapters. I'm too smol brained and poor to come up with something groundbreaking and actually train it. I hope once the Google Titans architecture has some pre-trained models to make use of them. Or maybe even beyond LLM tech, who knows? Then I don't have to worry about memory agents, it learns and updates weights during inference. And the size / intelligence ratio will be better by then, so I could train it for cheap. Instead of the usual reasoning, it comes up with the whole unconscious thought process itself. Right now I'm mostly just testing it with my Emmy character. Depending on how it works, my next step might be to let her loose in an environment. Some virtual minecraft-esque world to move around and interact with the environment. Already browsed the robot vision thread and found some interesting projects.
>>39480 It's a very exciting-sounding project, Anon! :^) >And the size / intelligence ratio will be better by then, so I could train it for cheap. >Instead of the usual reasoning, it comes up with the whole unconscious thought process itself. I'd sure like to hear extensive breakdowns of these two statements when you have the time to, Anon. Cheers. :^)
>>39454 Much thanks!
>>39482 Sure, I'd love to elaborate. >And the size / intelligence ratio will be better by then, so I could train it for cheap. Basically, I'm waiting for some sort of breakthrough that makes inference and training much cheaper with same computational power. Maybe the diffusion LLM project, or RWKV, or something completely different. If a 1B model performs as well as the 12B model, I could train it overnight on my GPU. Use that to update the personality bias and the memory layer loras without lobotomizing the model. Or someone comes up with neural network architecture that really learns during inference, and isn't just fancy autocomplete like transformers. >Instead of the usual reasoning, it comes up with the whole unconscious thought process itself. This one is interesting. There is a guy who already does something like this. https://github.com/yukiarimo/yuna-ai He trained his own model to generate to generate different data for the situation. ><yuki>: User's dialogue ><yuna>: Companion's dialogue ><hito>: Other peoples' dialogue in the same conversation ><qt>: Internal thoughts and feelings ><action>: Function calls and actions ><data>: Embedded data or information Talked to him, nice but weird dude. This is like the reasoning of modern models, but for different aspects of world building, and the tags are additional special tokens. Usual models have system, assistant and user. In theory, a model could be trained to approximate the logic of all my agents. You start off with the dialogue, and the model dynamically generates the thoughts, goals, emotional impact of the new input on the fly. This means that I wouldn't have to make sooo many prompts with specialized agents, just to get one specific output (such as valence, anxiety delta as json). I decided not go any further with this, because I can't into math and even if I manage do make progress, some chinese dude will have it perfected by the time my code works. In the meantime, i'm getting into holographic waifus. Bought a Quest 3 and now I'm researching slam and segmentation.
>>39619 Check out our visual waifus thread! >>240
>>39619 I don't know how to do this but I read the Chinese, I think, were using computational farms if I remember correctly from Amazon or Google. It seems that they have cheap computing power. They need massive compute so they sell off the excess and they were training AI's starting from a known pre-built AI for like $200. That might be a way to cut cost. Start with a huge open source AI, then train on a waifu dataset for a few hundred dollars. Once trained it of course can be duplicated endlessly. A suggestion is to have it when turned on have standard "command training function". Very much like talking to little kids. Maybe an input password, can be verbal, like "name of bot", do this, don't do that, stop. Should be a natural verbal command structure to correct like a kid so you don't get confused. Very simple two-three year old human commands and each one of these would provide a "control vector" as talked about here, >>31242 >>24943 >>31268 >>33184 >>35865
Here's instructions on how to integrate Open WebUI, F5-TTS voice cloning and KDTalker avatar using Pinokio for easier install. It just adds a button to generate the video avatar after the response, so it is optional. https://github.com/Barfalamule/KDTalker-OpenWebUIAction There's a lot of functions for Open WebUI like websearch, home assistant, weather and memory. And you could pair with OBS\webcam for vision and maybe output the video to a screenface Takes under 5 seconds for short response on 3090 for the audio and 30 seconds for the video, and should be about double on a 5060 ti 16GB. Or you could rent a B200 by the hour and have it instant.
>>39802 It looks great, what you describe definitely is an upgrade to the normal offline local AIs I use. You should definitely do a video tutorial.
>>39803 Thanks. Will do at some point but have to censor everything for github
>>39802 >>39804 Hi Barf, thanks! >...but have to censor everything for github You're not beholden to Microsoft's GitHub, Anon. There are plenty of good alternatives. --- Regardless, thanks for keeping us all up to date on your progress! Cheers. :^)
Edited last time by Chobitsu on 07/07/2025 (Mon) 15:36:44.

Report/Delete/Moderation Forms
Delete
Report