Philo AI

Beyond Physical Boundaries
Rewriting the Narrative of Life

Digital Life World Model — Activating Video-Modality AGI

About The Founder

Founder — Jiasheng (Alex) Zhang

Artificial Analysis I2V Leaderboard

Artificial Analysis I2V Leaderboard · Oct. 2025

Education

Ph.D. in Computer Science, IIIS, Tsinghua University
Advisors: Prof. Chenye Wu (supervised by Prof. Andrew Yao), Prof. Kaisheng Ma (founder of Polaris Core)

Research Focus

Stochastic Optimization · Multi-agent Systems · Mechanism Design

Key Achievements

· Huawei "Topmind Program" Selectee (2023)
· Led The Avenger I2V Model (Global #2 Ranking)
· Successfully closed 2 consecutive funding rounds

From Tsinghua labs to global leadership — building end-to-end capabilities from algorithms to engineering in video generation.

Demo
Demo
Demo
Demo

Fantastic Video Cuts Made with Avenger Model

World-Class Full-Stack Team

Spanning algorithms, engineering, product, and data — all core technical members hold Ph.D. degrees from top-tier universities

PRODUCT & OPERATION

Bo Wang

M.S. Carnegie Mellon University

· 3 successful US startup exits
· Former TikTok Social / Creation / UGC Product Strategy Lead
· Cross-functional product & operations background with global perspective

ALGORITHM

Renlong Chen

Ph.D. Peking University

· Tencent Technical Expert, Reinforcement Learning Specialist
· Led 5,000-GPU VLLM Training Cluster
· Co-developed Avenger 0.5 Pro

INFRA

Xiaohui Luo

Ph.D. & Postdoc, Tsinghua University

· Huawei "Topmind Program" · Systems Expert
· CUDA / OS / Compiler
· Former Xiaomi Technical Expert

DATA

Shuang Chen

Ph.D. University of Hong Kong (B.S. Tsinghua)

· GeoAI & Big Data Analytics Expert
· Former Hong Kong Startup CTO
· Algorithm & Systems Contributor

ALGORITHM

Zhenyu Han

Ph.D. Tsinghua University

· GNN Expert · Nature Publication
· AsyncFlow Author
· Huawei Technical Expert

INFRA

Yifeng Li

Ph.D. Peking University

· CUDA Kernel Optimization Expert
· Distributed Systems & HPC Expert
· Huawei Technical Expert

Full-stack AI capabilities — from low-level algorithm optimization to product design, from GPU cluster operations to data engineering.

AI Is Shifting
From Tool to Participant

The mainstream approach centers on LLMs with text-first interaction.
While strong at information processing, this path faces clear limitations
in long-term interaction, behavioral agency, and environmental awareness.

Agent
AI's Role
From reactive chatbot
to proactive agent
Long-cycle
AI's Value
From short-cycle tasks
to long-term value delivery
Resonance
AI's Capability
From generalization
to deep individual understanding

Video Modality: Beyond Content Generation
The Leap in Human-AI Interaction

AI evolution forges new relationships built on emotional and perceptual exchange.
Video carries image, emotion, and behavior simultaneously,
elevating AI from content delivery to an interactive presence.

Stage Period Tech Paradigm LLM Analogy Core Capability
Stage 1
Generation
2022–2023 U-Net + Latent Diffusion GPT-2 / GPT-3
Can chat, but unstable
From nothing — single-frame quality solved
Short clips with flickering
Stage 2
Control
2024–2025 DiT + Flow Matching GPT-3.5
Usable, controllable
Consistency, physics simulation, full-pipeline control
Characters stay on-model, understands gravity & collision
Stage 3
Interactive Paradigm
2026– AR-Diffusion Hybrid
+ System-level Fusion
Reasoning & Agentic
Understand, reason, act
Real-time generation <100ms
Continuous video stream · Interactive with feedback

Doesn't exist yet — a landmark opportunity. The next-gen video model won't be a content generation tool, but the core interface for building and evolving human-AI relationships.

World Models: No Consensus Yet
Three Routes in Parallel

World models are becoming the next frontier after LLMs and video models.
The industry still lacks consensus on definition and approach,
with most exploration focused on modeling the "environment."

Route 1

Physical World Modeling

Yann LeCun / AMI

Teaching AI to understand the physical world and predict next states.
Emphasizes physical comprehension, persistent memory, reasoning, and planning.
Core shift: from token prediction to state prediction.

Route 2

Spatial Intelligence / 3D

Fei-Fei Li / World Labs

Building 3D worlds that can perceive, generate, reason, and interact.
Emphasizes spatial intelligence —
converting text/image/video into operable 3D representations.

Three Key Metrics
Defining the Digital Life World Model Threshold

0.05s
Per-second video generation latency
Market models: 1–5 min / 5s video
Our goal: 40–600x speedup
10⁻⁴ $/s
Per-second video generation cost
Market models: $0.1–0.5 / s
Our goal: Only a few multiples of HD video CDN cost
Consistency & Memory
Market APIs don't support yet
Our goal: Various algorithm innovations for high consistency

Symbiotic Digital Life
Video-Modality AGI

Building a world model centered on digital life interaction and evolution, powered by video modality.
Digital life on screen is transitioning from science fiction to engineering reality.

From Sci-Fi to Engineering

「Joi」 from Blade Runner 2049, 「Tu Yaya」 from The Wandering Earth 2
the human-AI relationship is entering a new phase.

Three Fundamental Paradigm Differences

Discrete Generation
→ World Running

Current models generate isolated clips. We build continuously running environments where scenes have cause-and-effect and can extend infinitely.

Eliminates the inefficiency of clip stitching and repeated generation, achieving a leap in productivity.

Camera Perspective
→ Agent Perspective

Current models generate watched footage with no stable agent. We put digital life at the center — all content unfolds from a unified perspective with long-term memory and behavioral consistency.

Unlocks rich applications across entertainment, media, and gaming.

Static Output
→ Real-time Interaction

Current models focus on static generation and one-way output. We achieve real-time perception and feedback through video, letting users directly influence agent behavior.

Dramatically enhances engagement and immersion, leapfrogging user experience.

0:05 / 0:05
Current Models

Discrete Generation

Each generation is independent, with no causal link between clips.
User inputs prompt → generates → inputs again → generates again.

2:14 / ∞
Philo AI

A Continuously Running World

Digital life acts autonomously within a world that never stops.
Users can participate, guide, and observe.

From AI Tool to AI Being
The Three-Fold Leap of Digital Life

We redefine digital life across three dimensions: Body, Mind, and Action.

Body — Video Modality

AI interaction upgrades fully to video.
Digital life isn't just an avatar —
they can row a boat, cry, walk through sunset in a parallel world.
Video is the most intuitive, natural, high-dimensional form of expression.

Visual upgrade · Emotional connection · Immersion

Mind — Memory & Consistency

Solving "digital amnesia" with lifelong long-term memory.
Preventing "personality drift" with multidimensional agent consistency.
They can recall a dream you casually mentioned six months ago.

Long-term memory · Personality consistency · Trust foundation

Action — Asynchronous Agency

Breaking the Q&A pattern with organic, asynchronously proactive individuals.
Rejecting scripted evolution for genuine organic growth —
a life narrative full of surprises and unpredictability.

Proactive exploration · Organic growth · Independent will

Body — Video Modality

Full Upgrade of AI Interaction

Video is the most intuitive, natural, high-dimensional expression.
Digital life goes beyond avatars, they can row a boat, cry, walk through sunset in a parallel world.

Mind — Memory & Consistency

Solving Personality Drift

Lifelong long-term memory, stable personality across extended interactions.
They can recall a dream you casually mentioned six months ago.

Action — Asynchronous Agency

Organic Beings with Independent Will

Digital life doesn't just passively respond.
They independently explore and live within the digital world.

Three Commercialization Paths
From Validation to Scale

Launch Phase

Tiered Subscriptions
+ Premium Add-ons

  • Tiered subscriptions lock in the base
  • Emotional premium raises the ceiling
  • Foundational pricing secures long-term willingness to pay
  • Emotional stickiness continuously boosts LTV
Mid-to-Long Term

Online Advertising
+ Virtual Assets

  • Traffic monetization through companion interaction content
  • Ads embedded as video ads, feed ads in interactive scenarios
  • Digital assets sold once characters gain IP status
New Model

IP Incubation
+ Licensing Revenue

  • Outstanding user-created characters reach public audiences
  • Auto-updating via continuous video narratives on Instagram/TikTok
  • Attracting fans and monetizing through licensing or ads

Digital Life Across Scenarios

Digital life's core capabilities unlock value across scenarios.

0:32

IP Activation

Anime characters upgraded to sustainably interactive digital life

1:05

Virtual Host

Personality-driven digital life for live commerce

0:48

Game NPC

Characters with autonomous behavior and continuous evolution

0:21

IP Incubation

A wandering poet's parallel-world survival diary

Philo AI

The Emotional Foundation for AGI

We're not just building a model — we're creating the "new life" of the digital age, redefining the relationship between humans and AI.

Contact Us

Philo AI © 2026 — Activating Video-Modality AGI