Testing Local AI Video Generation: A Short Film Experiment with Wan2.2 5B

← Back to main page

Introduction

I've been curious about the recent advances in AI video generation, particularly local models like Wan2.2 5B. So I decided to create a short film to test the capabilities and document the technical process. The result is a 70-second anime-style narrative about human-AI collaboration, created entirely using local AI video generation.

The Experiment

Model Choice: Hardware Constraints Matter

The first challenge was model selection based on GPU constraints:

The Creative Process

Story Concept: "Human + AI Ascension"

The narrative follows three characters from the year 2000 to 2025:

The story arc moves from their early careers, through concerns about AI displacement, to ultimately finding collaborative success with AI systems.

Technical Implementation

Sample Prompts

Here are a few examples of the detailed prompts used:

Scene 1 - Young Software Engineer:

"Anime style, dynamic scene with camera zooming in, young software engineer Alex with messy brown hair and green hoodie coding enthusiastically on laptop, bustling urban street with people walking past, food trucks, street vendors, dynamic background activity, multiple moving elements, fingers flying over keyboard, vibrant blues and greens, bright energetic lighting, fluid camera movement"

Scene 14: Jubilant Celebration:

"Anime style, massive jubilant crowds celebrating in city squares, anime-style characters of all ages cheering and raising hands in victory, confetti falling, no screens or text visible, diverse anime crowds celebrating human-AI collaboration successes through pure celebration and joy, brilliant golden celebratory lighting, festival atmosphere, sense of collective triumph and hope for the future, pure anime art style with no digital displays, emphasize anime character designs"

Technical Pipeline

Video Production Workflow

  1. Generation: Created 10 iterations of each scene (130 videos total)
  2. Selection: Chose the best version of each scene
  3. Text Overlays: Used FFmpeg for efficient text generation: bash ffmpeg -i black.mp4 -vf "drawtext=text='Meet Alex - the coder':fontcolor=white:fontsize=60:x=(w-text_w)/2:y=(h-text_h)/2:enable='between(t,0,2)'" meet_alex.mp4
  4. Assembly: Concatenated scenes using FFmpeg
  5. Thumbnail Generation: Created preview grid with: bash ffmpeg -i output.mp4 -vf "fps=1/2,scale=256:141,tile=8x6" thumbnail_grid.png

Scene Thumbnails Overview

The thumbnail grid proved invaluable for quality assessment - uploading it to a large multimodal LLM helped identify the best scenes and spot consistency issues across the narrative.

Quality Observations

Character Consistency: Maintaining character appearance across scenes proved challenging. Anime style helped reduce artifacts compared to photorealistic approaches.

Motion Quality: The model handled dynamic scenes well, with convincing camera movements and background activity.

Artifacts: While present, artifacts were minimized by the anime aesthetic and careful prompt engineering.

Results

Final Output

Video Preview

All scenes are created locally using Wan-AI/Wan2.2-TI2V-5B model.

← Back to main page