Sulphur 2 and 10Eros: The Two Open-Source Models Pushing AI Video Past Its Limits
Sulphur 2 is a fully uncensored video generation model built on LTX 2.3. Together with 10Eros, a variant optimized for image-to-video, they make the most interesting pair right now for anyone who wants AI video generation with no filters and no cloud.
Why you should care
If generative video interests you even a little, this is one of those moments worth paying attention to. Not because the definitive model has arrived, but because something in the open-source landscape is shifting direction.
Until yesterday, the AI video world was split in two: on one side, the closed giants (Sora, Veo, Runway); on the other, WAN 2.2, the open-source benchmark for visual quality. LTX 2.3 had already tried to compete on different ground: speed and native audio. But it was a model crippled by the safety filters imposed by Lightricks.
Now two independent projects have taken LTX 2.3 and unlocked it. Sulphur 2 strips away every censorship layer. 10Eros optimizes it for image-to-video, the most practical use case for content creators. These aren’t new models. They’re the same foundation, finally usable the way it should be.
What they actually are
Sulphur 2 is an uncensored fine-tune of LTX 2.3, developed by FusionCow and a small team of collaborators. It’s a 9-billion-parameter text-to-video model built on a Qwen 3.5 architecture, generating video with synchronized audio up to 20 seconds. The core selling point: zero content filters. Any prompt, any subject, any scene. The model just generates.
LTX 2.3 is the latest version of Lightricks’ open-weight model, released in April 2026. Compared to LTX 2.2, it introduces a new VAE (Variational Autoencoder) that improves fine-detail sharpness, a quadrupled text encoder for better prompt adherence, native 9:16 portrait support, 24/48 FPS options, and spatial and temporal upscalers. It supports text-to-video, image-to-video, audio-to-video, and video extension, all within a single model. The main advantage is speed: on comparable hardware, LTX 2.3 is about 18 times faster than WAN 2.2. A video that WAN takes 15-18 minutes to generate, LTX completes in 1-2 minutes.
10Eros is the second protagonist of this release. Created by TenStrip, it’s a specialized merge of Sulphur 2 optimized specifically for image-to-video. It’s not a simple weight mix: it uses a layer-scaled merge of different training steps, a technique that preserves quality better than traditional LoRAs. The stated goal is high-quality I2V, with particular emphasis on prompt adherence and temporal coherence.
TenStrip is the same developer who previously worked on distilled LoRA experiments for LTX, creating “cond_safe” versions that don’t degrade the base model’s fine-tuning. With 10Eros, he took it further: instead of loading a LoRA on top of the model, he directly merged weights to achieve more stable behavior. The model requires explicit prompt enhancement: LTX has limited autonomous reasoning over prompts, so you need to describe every movement, scene evolution, dialogue, and audio in detail. If you don’t ask for it, you don’t get it. This is an architectural characteristic of LTX, not a bug.
Why “uncensored” matters more than you’d think
The term “uncensored” in the AI world often gets associated with NSFW content. But here, the issue is broader and more technical.
Commercial video models apply safety filters at multiple levels: on the input prompt (rejecting certain words), during intermediate generation (blocking diffusion if “sensitive” patterns are detected), and on the final output (blurring or refusing the video). The original LTX 2.3 included these moderation layers, which in practice limited not just explicit content but also stylized violence, body horror, medical contexts, or simply subjects the AI interpreted as borderline. For a creator trying to generate a horror video, a fight scene, or a medical documentary, these filters are a real obstacle.
Sulphur 2 removes every moderation layer. The result is a model that responds exactly to the prompt: no refusals, no blurring, no “I’m sorry, I can’t generate this content.” For researchers, for niche content creators, for anyone who simply doesn’t want a company deciding what they can or cannot generate, this is a substantial difference.
The practical part: what you need to try them
Sulphur 2
- Download: SulphurAI/Sulphur-2-base on HuggingFace
- Format: BF16 (9.53 GB) or quantized GGUF
- ComfyUI: updated LTXVideo nodes
- Recommended VRAM: 12+ GB for FP8, 16+ GB for BF16
- Prompt enhancer included (mmproj + q8_0 model)
10Eros I2V
- Download: TenStrip/LTX2.3-10Eros
- Kijai split files for FP8 Transformer version
- Recommended distilled LoRAs: “cond_safe” version (not the full ones, which degrade the model)
- Example workflows: LTX2.3-10Eros_Workflows
For those who want to go deeper
From here on, we’re getting technical. If you’re more interested in the idea than the implementation, feel free to jump straight to the conclusion.
The LTX 2.3 architecture
LTX 2.3 is a Diffusion Transformer (DiT) with 22 billion total parameters, with Sulphur 2 using a 9B variant (the Qwen 3.5 base). Unlike WAN 2.2, which uses a Mixture-of-Experts architecture, LTX takes a more traditional latent transformer approach, which explains the speed gap.
The new VAE in LTX 2.3 is the most significant architectural improvement. Compared to the previous version, it better compresses fine details into latent space, reducing artifacts like melting zippers, warping faces, or textures “floating” on objects. It’s not a flashy change: it’s the absence of small visual annoyances that were previously unavoidable.
The text encoder has been quadrupled compared to LTX 2.2. This means the model understands much more complex and nuanced prompts. However, LTX still has limited autonomous “reasoning” over prompts: it needs to be guided with detailed, structured descriptions. TenStrip provides specific templates for prompt enhancement, including prompt engineering for scenes with dialogue, diegetic audio, and Foley.
Sulphur 2 vs 10Eros: when to use which
| Feature | Sulphur 2 | 10Eros |
|---|---|---|
| Focus | General Text-to-Video | Optimized Image-to-Video |
| Censorship | None | None |
| Prompt enhancement | Recommended | Required |
| VRAM (FP8) | ~12 GB | ~12 GB |
| VRAM (BF16) | ~16 GB | ~16 GB |
| Native audio | Yes | Yes |
| Max duration | 20 sec | 20 sec |
| LoRA compatible | Yes (with caution) | cond_safe only |
The choice between the two depends on your workflow. If you’re starting from scratch with a text prompt, Sulphur 2 is the natural choice. If you have a starting image (a frame, a render, a photo) and want to animate it, 10Eros is superior thanks to its optimized merge. In practice, many creators use both: generate a first frame with an image generation model (Flux, SDXL), then animate it with 10Eros.
The open-source AI video landscape
As of May 2026, the open-source video model landscape has polarized around two main contenders:
| Model | Parameters | Speed | Audio | Max Duration | Key Strength |
|---|---|---|---|---|---|
| LTX 2.3 | 22B total | ⚡ 18x vs WAN | Native | 20 sec | Speed + audio |
| WAN 2.2 | 14B MoE | 🐢 | No | 5 sec (extendable) | Cinematic quality |
| Sulphur 2 | 9B | ⚡ | Native | 20 sec | Uncensored T2V |
| 10Eros | 9B | ⚡ | Native | 20 sec | Uncensored I2V |
WAN 2.2 remains the reference for pure visual quality, especially for cinematic motion and subject coherence. But it generates only 5-second videos (extendable with tricks), has no native audio, and is much slower. LTX 2.3 and its derivatives like Sulphur 2 win on speed, duration, and audio. The optimal strategy many are adopting: rapid prototyping with LTX/Sulphur, final refinement with WAN 2.2.
Outside the open-source world, commercial models (Sora 2, Veo 3.1, Kling 3.0) offer superior quality, native 4K, and multi-shot storytelling, but at costs ranging from $0.10 to $0.50 per second of generated video.
The bottom line
Key points:
- Sulphur 2 unlocks LTX 2.3 by removing all censorship: generates any content without filters
- 10Eros is the optimized variant for image-to-video, with advanced weight merging
- LTX 2.3 remains ~18x faster than WAN 2.2, with native audio and videos up to 20 seconds
- The open-source AI video landscape now has a real uncensored alternative
We’re not at the point where an open-source model generates better video than Veo 3 or Sora 2. But we are at the point where you can generate exactly what you want, without anyone deciding for you what’s acceptable. And that’s a difference that goes beyond visual quality.