Philo AI — Digital Life World Model

About The Founder

Founder — Jiasheng (Alex) Zhang

Artificial Analysis I2V Leaderboard · Oct. 2025

Education

Ph.D. in Computer Science, IIIS, Tsinghua University
Advisors: Prof. Chenye Wu (supervised by Prof. Andrew Yao), Prof. Kaisheng Ma (founder of Polaris Core)

Research Focus

Stochastic Optimization · Multi-agent Systems · Mechanism Design

Key Achievements

· Huawei "Topmind Program" Selectee (2023)
· Led The Avenger I2V Model (Global #2 Ranking)
· Successfully closed 2 consecutive funding rounds

From Tsinghua labs to global leadership — building end-to-end capabilities from algorithms to engineering in video generation.

Fantastic Video Cuts Made with Avenger Model

World-Class Full-Stack Team

Spanning algorithms, engineering, product, and data — all core technical members hold Ph.D. degrees from top-tier universities

PRODUCT & OPERATION

Bo Wang

M.S. Carnegie Mellon University

· 3 successful US startup exits
· Former TikTok Social / Creation / UGC Product Strategy Lead
· Cross-functional product & operations background with global perspective

ALGORITHM

Renlong Chen

Ph.D. Peking University

· Tencent Technical Expert, Reinforcement Learning Specialist
· Led 5,000-GPU VLLM Training Cluster
· Co-developed Avenger 0.5 Pro

INFRA

Xiaohui Luo

Ph.D. & Postdoc, Tsinghua University

· Huawei "Topmind Program" · Systems Expert
· CUDA / OS / Compiler
· Former Xiaomi Technical Expert

DATA

Shuang Chen

Ph.D. University of Hong Kong (B.S. Tsinghua)

· GeoAI & Big Data Analytics Expert
· Former Hong Kong Startup CTO
· Algorithm & Systems Contributor

ALGORITHM

Zhenyu Han

Ph.D. Tsinghua University

· GNN Expert · Nature Publication
· AsyncFlow Author
· Huawei Technical Expert

INFRA

Yifeng Li

Ph.D. Peking University

· CUDA Kernel Optimization Expert
· Distributed Systems & HPC Expert
· Huawei Technical Expert

Full-stack AI capabilities — from low-level algorithm optimization to product design, from GPU cluster operations to data engineering.

AI Is Shifting
From Tool to Participant

The mainstream approach centers on LLMs with text-first interaction.
While strong at information processing, this path faces clear limitations
in long-term interaction, behavioral agency, and environmental awareness.

Agent

AI's Role
From reactive chatbot
to proactive agent

Long-cycle

AI's Value
From short-cycle tasks
to long-term value delivery

Resonance

AI's Capability
From generalization
to deep individual understanding

Video Modality: Beyond Content Generation
The Leap in Human-AI Interaction

AI evolution forges new relationships built on emotional and perceptual exchange.
Video carries image, emotion, and behavior simultaneously,
elevating AI from content delivery to an interactive presence.

Stage	Period	Tech Paradigm	LLM Analogy	Core Capability
Stage 1 Generation	2022–2023	U-Net + Latent Diffusion	GPT-2 / GPT-3 Can chat, but unstable	From nothing — single-frame quality solved Short clips with flickering
Stage 2 Control	2024–2025	DiT + Flow Matching	GPT-3.5 Usable, controllable	Consistency, physics simulation, full-pipeline control Characters stay on-model, understands gravity & collision
Stage 3 Interactive Paradigm	2026–	AR-Diffusion Hybrid + System-level Fusion	Reasoning & Agentic Understand, reason, act	Real-time generation <100ms Continuous video stream · Interactive with feedback

Doesn't exist yet — a landmark opportunity. The next-gen video model won't be a content generation tool, but the core interface for building and evolving human-AI relationships.

World Models: No Consensus Yet
Three Routes in Parallel

World models are becoming the next frontier after LLMs and video models.
The industry still lacks consensus on definition and approach,
with most exploration focused on modeling the "environment."

Route 1

Physical World Modeling

Yann LeCun / AMI

Teaching AI to understand the physical world and predict next states.
Emphasizes physical comprehension, persistent memory, reasoning, and planning.
Core shift: from token prediction to state prediction.

Route 2

Spatial Intelligence / 3D

Fei-Fei Li / World Labs

Building 3D worlds that can perceive, generate, reason, and interact.
Emphasizes spatial intelligence —
converting text/image/video into operable 3D representations.

Route 3 — Philo

Digital Life Interaction & Evolution

Philo AI

Introducing agents with memory, behavior, and evolution
into continuously running worlds.
Focus on agency, long-term memory, personality consistency,
proactive behavior, and relationship evolution.

Three Key Metrics
Defining the Digital Life World Model Threshold

0.05s

Per-second video generation latency
Market models: 1–5 min / 5s video
Our goal: 40–600x speedup

10⁻⁴ $/s

Per-second video generation cost
Market models: $0.1–0.5 / s
Our goal: Only a few multiples of HD video CDN cost

∞

Consistency & Memory
Market APIs don't support yet
Our goal: Various algorithm innovations for high consistency

Three Fundamental Paradigm Differences

Discrete Generation
→ World Running

Current models generate isolated clips. We build continuously running environments where scenes have cause-and-effect and can extend infinitely.

Eliminates the inefficiency of clip stitching and repeated generation, achieving a leap in productivity.

Camera Perspective
→ Agent Perspective

Current models generate watched footage with no stable agent. We put digital life at the center — all content unfolds from a unified perspective with long-term memory and behavioral consistency.

Unlocks rich applications across entertainment, media, and gaming.

Static Output
→ Real-time Interaction

Current models focus on static generation and one-way output. We achieve real-time perception and feedback through video, letting users directly influence agent behavior.

Dramatically enhances engagement and immersion, leapfrogging user experience.

From AI Tool to AI Being
The Three-Fold Leap of Digital Life

We redefine digital life across three dimensions: Body, Mind, and Action.

Body — Video Modality

AI interaction upgrades fully to video.
Digital life isn't just an avatar —
they can row a boat, cry, walk through sunset in a parallel world.
Video is the most intuitive, natural, high-dimensional form of expression.

Visual upgrade · Emotional connection · Immersion

Mind — Memory & Consistency

Solving "digital amnesia" with lifelong long-term memory.
Preventing "personality drift" with multidimensional agent consistency.
They can recall a dream you casually mentioned six months ago.

Long-term memory · Personality consistency · Trust foundation

Action — Asynchronous Agency

Breaking the Q&A pattern with organic, asynchronously proactive individuals.
Rejecting scripted evolution for genuine organic growth —
a life narrative full of surprises and unpredictability.

Proactive exploration · Organic growth · Independent will

Three Commercialization Paths
From Validation to Scale

Launch Phase

Tiered Subscriptions
+ Premium Add-ons

Tiered subscriptions lock in the base
Emotional premium raises the ceiling
Foundational pricing secures long-term willingness to pay
Emotional stickiness continuously boosts LTV

Mid-to-Long Term

Online Advertising
+ Virtual Assets

Traffic monetization through companion interaction content
Ads embedded as video ads, feed ads in interactive scenarios
Digital assets sold once characters gain IP status

New Model

IP Incubation
+ Licensing Revenue

Outstanding user-created characters reach public audiences
Auto-updating via continuous video narratives on Instagram/TikTok
Attracting fans and monetizing through licensing or ads

Digital Life Across Scenarios

Digital life's core capabilities unlock value across scenarios.

0:32

IP Activation

Anime characters upgraded to sustainably interactive digital life

1:05

Virtual Host

Personality-driven digital life for live commerce

0:48

Game NPC

Characters with autonomous behavior and continuous evolution

0:21

IP Incubation

A wandering poet's parallel-world survival diary

Beyond Physical BoundariesRewriting the Narrative of Life

Founder — Jiasheng (Alex) Zhang

Education

Research Focus

Key Achievements

World-Class Full-Stack Team

Bo Wang

Renlong Chen

Xiaohui Luo

Shuang Chen

Zhenyu Han

Yifeng Li

AI Is ShiftingFrom Tool to Participant

Video Modality: Beyond Content GenerationThe Leap in Human-AI Interaction

World Models: No Consensus YetThree Routes in Parallel

Physical World Modeling

Spatial Intelligence / 3D

Digital Life Interaction & Evolution

Three Key MetricsDefining the Digital Life World Model Threshold

Symbiotic Digital LifeVideo-Modality AGI

From Sci-Fi to Engineering

Three Fundamental Paradigm Differences

Discrete Generation→ World Running

Camera Perspective→ Agent Perspective

Static Output→ Real-time Interaction

Discrete Generation

A Continuously Running World

From AI Tool to AI BeingThe Three-Fold Leap of Digital Life

Body — Video Modality

Mind — Memory & Consistency

Action — Asynchronous Agency

Full Upgrade of AI Interaction

Solving Personality Drift

Organic Beings with Independent Will

Three Commercialization PathsFrom Validation to Scale

Tiered Subscriptions+ Premium Add-ons

Online Advertising+ Virtual Assets

IP Incubation+ Licensing Revenue