From Perception to Presence

Turning AI into a Living Being

Why Pophie is Different

The Problem with Today's Robots

Industry Status Quo

  • Weak vision capabilities
  • Passive response (Trigger-based)
  • "Chat" only, cannot drive behavior
  • Pre-scripted mechanical motions
  • Breaks down in multi-person scenarios
The Pophie Way

"Pophie is built as an AI Lifeform where perception, cognition, emotion, memory, and expression operate as one."

What We Built

The World's First AI Lifeform Architecture

An AI Lifeform continuously perceives the real world, understands people and context, forms memory and emotion, and expresses itself through body, gaze, and voice.

On-Device Intelligence

Low-latency perception and real-time control. It drives instant reactions, smooth motion, and always-on attention—before the cloud responds.

  • Face tracking and gaze locking
  • Audio interrupt and turn-taking cues
  • Touch, posture, and reflex loops
  • Motion control and safety limits
  • Local state: awake, sleepy, engaged

Cloud Lifeform Engine

Full-modal understanding, reasoning, memory, and personality. It builds context, learns preferences, and plans responses across voice, motion, and expression.

  • Multimodal understanding and reasoning
  • Identity, emotion, and intent modeling
  • Long-term memory and personalization
  • Story generation and dialogue planning
  • Continuous behavior orchestration
How It Works

Edge + Cloud, One Living Loop

Step 1 — Sense

Edge-first sensing.
Cloud-ready context.

Eyes, microphones, touch, and motion sense continuously on-device, capturing who is here and what's happening — in real time.

Step 2 — Understand

Cloud-level understanding.
Edge-level attention.

The cloud fuses vision with audio cues to build scene and social context, while the edge maintains attention — gaze tracking, speaker direction, and interruption cues.

Step 3 — Decide

Personality, memory,
and social rules.

The cloud chooses intent and behavior using character, memory, and social dynamics — while the edge enforces timing and safety constraints.

Step 4 — Express

Expressed by the body.
Synchronized by the loop.

Eyes move first, body follows. Motion, voice, and belly light deliver responses with lifelike timing — no pre-scripted loops.

CORE CAPABILITIES

Capabilities That Create Presence

A unified loop across vision, conversation, emotion, memory, self-model, and motion.

Real-world Understanding

Cloud vision reasoning that turns what Pophie sees into meaning—and action.

Visual Reasoning

She interprets scenes and context, not just objects or faces.

Intent & Emotion Cues

She infers attention, intent, and mood from gaze, posture, and situation.

Vision-to-Interaction

What she sees naturally shapes dialogue and behavior—so responses feel timely and lifelike.

Proactive Attention

Powered by continuous real-time visual reasoning, not trigger-based detection.

Attention

You look at her → she notices. She follows your gaze and keeps eye contact naturally.

Initiation

You keep looking → she reacts emotionally. You wave → she starts the conversation.

Context Awareness

Actively scans surroundings. Understands who is present and what's happening.

Natural Multi-person Conversation

A truly usable home conversation system.

No Wake Word

Look and speak naturally. Interrupt anytime.

Multi-Person Awareness

Knows who is speaking, who is being spoken to, and won't interrupt human-to-human conversations.

Memory-Driven Dialogue

Remembers each person separately. Asks better questions over time.

Self-awareness & Boundaries

She knows "who she is".

Identity & Boundaries

Knows what she can and cannot do. Has boundaries and emotions.

Emotional Autonomy

Can say "no", get annoyed, or feel proud based on interaction history.

Lifelike Expression

Not pre-set expression packs — a continuous emotion space where feelings blend and flow naturally.

Continuous Emotion, Not Preset Faces

Most robots switch between "Happy", "Sad", "Angry" like stickers. Pophie uses a 3D emotion model (Valence, Arousal, Dominance) where feelings transition smoothly — the way real emotions do.

Hyper-Real Eyes

Six blendable eye primitives create infinite expressions. Independent gaze per eye, iris micro-motion, highlight shimmer, and eyelid-driven rotation — no "dead eyes" here.

Whole-Body Coherence

The same emotion drives eyes, voice (speed, pitch, pauses), and body (speed, amplitude, posture) in sync. You experience one unified emotional being, not separate systems.

Expressive Motion

Memory & Growth

She doesn't just remember conversations — she builds a real understanding of who you are.

Four-Layer Memory

From moment-to-moment awareness to permanent knowledge. She'll remember you mentioned hating broccoli three months ago, or that your daughter's birthday is coming up.

Natural Forgetting

Like human memory — vivid details of important moments, faded impressions of the routine. What matters stays sharp; what doesn't fades naturally.

A Life of Her Own

When idle, she plays, hums, and explores. She has her own rhythms and curiosities — not just a blank screen waiting for a command.

Growth and Memory
SKILLS

An Ecosystem of Capabilities

Skill = Prompt + Code + Lifeform APIs. Developers define "what to do" — the OS handles making it feel alive.

Layer 1

Embodied Reflexes

Instant on-device reactions to touch, motion, and presence. No cloud latency — pure instinct.

Touch reactionsShake responsePickup detection
Layer 2

Interactive Skills

Dialogue-driven, single-step tasks combining conversation with real-world actions.

WeatherCameraRemindersTrivia
Layer 3

Agentic Skills

Multi-step autonomous tasks that plan, execute, and adapt — like a real assistant with follow-through.

Language tutoringMeeting notesBedtime routine
EMBODIMENT

Where Presence Becomes Physical

Embodied Expression.
Not Pre-Set Animations.

Every reaction is a coordinated full-body performance—driven by the life simulation system.

Whole-body coordination

Motion is never single-axis—eyes, head, body, and timing move as one.

Eyes lead, body follows

Gaze moves first, body follows, then gaze stabilizes—like a real being.

Expression without a screen

Eyes stay pure—no UI overlays, no icons, no "display face."

Motion Freedom

5-DOF Expressive Motion

Hands, ears, and full-body rotation enable rich emotional language.

Warmth

A Warm Body, Not Cold Plastic

Constant warmth adds a subtle "living" comfort when you hold her.

Eyes

Hyper-Real Eyes

Micro gaze dynamics, eyelid-follow, and subtle iris motion create true presence.

Light

Belly Light as Expression

Speech-synced light replaces a mouth—and color becomes emotion.

No Buttons

No Buttons, No Mode Switching

Power on/off, volume, and settings happen through natural interaction.

Wake & Status

Natural wake. Clear status.

Wake her by touch or by name. Ask for battery and connectivity anytime.

Pophie is not a robot that reacts.
She is a lifeform that perceives,
understands, and responds
with presence, emotion, and intention.