Sulphur 2 and 10Eros: The Two Open-Source Models Pushing AI Video Past Its Limits

Sulphur 2 is a fully uncensored video generation model built on LTX 2.3. Together with 10Eros, a variant optimized for image-to-video, they make the most interesting pair right now for anyone who wants AI video generation with no filters and no cloud.

Note: Sections highlighted in blue are research additions not present in the original source.

Why you should care

If generative video interests you even a little, this is one of those moments worth paying attention to. Not because the definitive model has arrived, but because something in the open-source landscape is shifting direction.

Until yesterday, the AI video world was split in two: on one side, the closed giants (Sora, Veo, Runway); on the other, WAN 2.2, the open-source benchmark for visual quality. LTX 2.3 had already tried to compete on different ground: speed and native audio. But it was a model crippled by the safety filters imposed by Lightricks.

Now two independent projects have taken LTX 2.3 and unlocked it. Sulphur 2 strips away every censorship layer. 10Eros optimizes it for image-to-video, the most practical use case for content creators. These aren’t new models. They’re the same foundation, finally usable the way it should be.

What they actually are

Sulphur 2 is an uncensored fine-tune of LTX 2.3, developed by FusionCow and a small team of collaborators. It’s a 9-billion-parameter text-to-video model built on a Qwen 3.5 architecture, generating video with synchronized audio up to 20 seconds. The core selling point: zero content filters. Any prompt, any subject, any scene. The model just generates.

LTX 2.3 is the latest version of Lightricks’ open-weight model, released in April 2026. Compared to LTX 2.2, it introduces a new VAE (Variational Autoencoder) that improves fine-detail sharpness, a quadrupled text encoder for better prompt adherence, native 9:16 portrait support, 24/48 FPS options, and spatial and temporal upscalers. It supports text-to-video, image-to-video, audio-to-video, and video extension, all within a single model. The main advantage is speed: on comparable hardware, LTX 2.3 is about 18 times faster than WAN 2.2. A video that WAN takes 15-18 minutes to generate, LTX completes in 1-2 minutes.

10Eros is the second protagonist of this release. Created by TenStrip, it’s a specialized merge of Sulphur 2 optimized specifically for image-to-video. It’s not a simple weight mix: it uses a layer-scaled merge of different training steps, a technique that preserves quality better than traditional LoRAs. The stated goal is high-quality I2V, with particular emphasis on prompt adherence and temporal coherence.

TenStrip is the same developer who previously worked on distilled LoRA experiments for LTX, creating “cond_safe” versions that don’t degrade the base model’s fine-tuning. With 10Eros, he took it further: instead of loading a LoRA on top of the model, he directly merged weights to achieve more stable behavior. The model requires explicit prompt enhancement: LTX has limited autonomous reasoning over prompts, so you need to describe every movement, scene evolution, dialogue, and audio in detail. If you don’t ask for it, you don’t get it. This is an architectural characteristic of LTX, not a bug.

Why “uncensored” matters more than you’d think

The term “uncensored” in the AI world often gets associated with NSFW content. But here, the issue is broader and more technical.

Commercial video models apply safety filters at multiple levels: on the input prompt (rejecting certain words), during intermediate generation (blocking diffusion if “sensitive” patterns are detected), and on the final output (blurring or refusing the video). The original LTX 2.3 included these moderation layers, which in practice limited not just explicit content but also stylized violence, body horror, medical contexts, or simply subjects the AI interpreted as borderline. For a creator trying to generate a horror video, a fight scene, or a medical documentary, these filters are a real obstacle.

Sulphur 2 removes every moderation layer. The result is a model that responds exactly to the prompt: no refusals, no blurring, no “I’m sorry, I can’t generate this content.” For researchers, for niche content creators, for anyone who simply doesn’t want a company deciding what they can or cannot generate, this is a substantial difference.

The practical part: what you need to try them

Try it yourself

Sulphur 2

Download: SulphurAI/Sulphur-2-base on HuggingFace
Format: BF16 (9.53 GB) or quantized GGUF
ComfyUI: updated LTXVideo nodes
Recommended VRAM: 12+ GB for FP8, 16+ GB for BF16
Prompt enhancer included (mmproj + q8_0 model)

10Eros I2V

Download: TenStrip/LTX2.3-10Eros
Kijai split files for FP8 Transformer version
Recommended distilled LoRAs: “cond_safe” version (not the full ones, which degrade the model)
Example workflows: LTX2.3-10Eros_Workflows

For those who want to go deeper

From here on, we’re getting technical. If you’re more interested in the idea than the implementation, feel free to jump straight to the conclusion.

The LTX 2.3 architecture

LTX 2.3 is a Diffusion Transformer (DiT) with 22 billion total parameters, with Sulphur 2 using a 9B variant (the Qwen 3.5 base). Unlike WAN 2.2, which uses a Mixture-of-Experts architecture, LTX takes a more traditional latent transformer approach, which explains the speed gap.

The new VAE in LTX 2.3 is the most significant architectural improvement. Compared to the previous version, it better compresses fine details into latent space, reducing artifacts like melting zippers, warping faces, or textures “floating” on objects. It’s not a flashy change: it’s the absence of small visual annoyances that were previously unavoidable.

The text encoder has been quadrupled compared to LTX 2.2. This means the model understands much more complex and nuanced prompts. However, LTX still has limited autonomous “reasoning” over prompts: it needs to be guided with detailed, structured descriptions. TenStrip provides specific templates for prompt enhancement, including prompt engineering for scenes with dialogue, diegetic audio, and Foley.

Sulphur 2 vs 10Eros: when to use which

Feature	Sulphur 2	10Eros
Focus	General Text-to-Video	Optimized Image-to-Video
Censorship	None	None
Prompt enhancement	Recommended	Required
VRAM (FP8)	~12 GB	~12 GB
VRAM (BF16)	~16 GB	~16 GB
Native audio	Yes	Yes
Max duration	20 sec	20 sec
LoRA compatible	Yes (with caution)	cond_safe only

The choice between the two depends on your workflow. If you’re starting from scratch with a text prompt, Sulphur 2 is the natural choice. If you have a starting image (a frame, a render, a photo) and want to animate it, 10Eros is superior thanks to its optimized merge. In practice, many creators use both: generate a first frame with an image generation model (Flux, SDXL), then animate it with 10Eros.

The open-source AI video landscape

As of May 2026, the open-source video model landscape has polarized around two main contenders:

Model	Parameters	Speed	Audio	Max Duration	Key Strength
LTX 2.3	22B total	⚡ 18x vs WAN	Native	20 sec	Speed + audio
WAN 2.2	14B MoE	🐢	No	5 sec (extendable)	Cinematic quality
Sulphur 2	9B	⚡	Native	20 sec	Uncensored T2V
10Eros	9B	⚡	Native	20 sec	Uncensored I2V

WAN 2.2 remains the reference for pure visual quality, especially for cinematic motion and subject coherence. But it generates only 5-second videos (extendable with tricks), has no native audio, and is much slower. LTX 2.3 and its derivatives like Sulphur 2 win on speed, duration, and audio. The optimal strategy many are adopting: rapid prototyping with LTX/Sulphur, final refinement with WAN 2.2.

Outside the open-source world, commercial models (Sora 2, Veo 3.1, Kling 3.0) offer superior quality, native 4K, and multi-shot storytelling, but at costs ranging from $0.10 to $0.50 per second of generated video.

The bottom line

Key points:

Sulphur 2 unlocks LTX 2.3 by removing all censorship: generates any content without filters
10Eros is the optimized variant for image-to-video, with advanced weight merging
LTX 2.3 remains ~18x faster than WAN 2.2, with native audio and videos up to 20 seconds
The open-source AI video landscape now has a real uncensored alternative

We’re not at the point where an open-source model generates better video than Veo 3 or Sora 2. But we are at the point where you can generate exactly what you want, without anyone deciding for you what’s acceptable. And that’s a difference that goes beyond visual quality.