Hi! I am Kaizhen Tan (Chinese name: 谭楷蓁). I am currently a master’s student in Artificial Intelligence at Carnegie Mellon University. I received my bachelor’s degree in Information Systems from Tongji University, where I built a solid foundation in programming, data analysis, and machine learning, complemented by interdisciplinary training in business and organizational systems.
My research sits at the intersection of Urban Management and Spatial Intelligence. I aim to build systems that are both spatially intelligent and socially aware, providing computational solutions to make cities more adaptive, inclusive, and human-centric.
Driven by the vision of harmonizing artificial intelligence with urban ecosystems, my research agenda is structured around four key topics. Please feel free to contact me if any of these resonate with you, I’d be happy to chat!
🤖 1. Embodied Urbanism & Collaborative Governance
Core Question: How should embodied intelligence move in cities, and how can humans govern and collaborate with them at scale?
- Robot-Friendly Urban Space: Redesign streets and building interiors for robot operations, setting standards for siting, infrastructure, protocols, and responsibility boundaries.
- Embodied Navigation: Vision-based navigation and task execution in real urban scenes, with accessibility-aware routing and ground robot and drone platforms.
- Low-Altitude Governance: Derive operable air corridors from demand signals, then validate in 3D city models under privacy, noise, crowds, and other effects.
- Emerging Urban Devices & Deployment: Study city-scale rollout of robots, low-altitude systems, wearables, and BCI-like devices, focusing on deployment bottlenecks.
- Social Acceptance & Ethics: Model how different groups perceive risks and capabilities, guiding interaction design, rollout strategy, and public communication.
🎨 2. Social Sensing & Human-Environment Interaction
Core Question: How can multimodal human-centered data translate into actionable insights for urban planning and governance?
- AI-Enhanced Geospatial Analysis: Use large-scale spatial analytics to link urban form, environment, and mobility with human behavior and public service outcomes.
- Pedestrian-Oriented Design: Analyze walking experiences and accessibility barriers, integrating disabled people’s mobility needs into urban governance decisions.
- Urban Perception & Visual Aesthetics: Quantify streetscape perception and neighborhood imagery to inform design choices and regeneration priorities.
- Socio-Cultural Computing: Incorporate place-based narratives into LLM-enabled applications for communication, interaction, and inclusive governance.
🏙️ 3. Self-Evolving Urban Digital Twins and Agents
Core Question: How can we build a self evolving urban digital twin that stays continuously updated, hosts agents, and supports sustainable and equitable city governance?
- Urban Foundation Models: Fuse remote sensing, street-level data, trajectories, IoT, text, and graphs into unified urban representations.
- Measurement & Sensing Loops: Develop scalable metrics and updating pipelines, using robots, drones, and wearables as new data sources for continuous urban sensing.
- Localization & Mapping: Advance geo-localization and semantic SLAM across point clouds, meshes, and 3D Gaussian representations for interactive 3D cities.
- Urban Agents: Build task agents for planning and public services, including map-LLM systems, spatial RAG, policy QA, and travel assistance.
- Policy Sandbox & Systems: Use the twin for what-if simulation, risk assessment, and execution checks, supported by efficient retrieval, caching, and rendering.
🚀 4. Spatial Intelligence & Foundation World Models
Core Question: How can world models support reliable spatial reasoning and actionable decision-making for physical agents?
- World Models & Architecture: Study generative, predictive, and representation-learning paradigms for forecasting and planning in the physical world.
- Embodied Representations: Unify geometry, semantics, physics, and action into shared representations, with a path toward richer embodied modalities.
- Long-term Memory & Self-Evolution: Build lifelong learning mechanisms with stability, forgetting control, and safety constraints for long-horizon autonomy.
- Neural-Inspired Reasoning: Improve interpretability and robustness of spatial reasoning, exploring 3D-aware encoders and alternatives to standard transformers.
🔥 News
- 2026.01: 🎉 The abstract co-authored with Prof. Fan Zhang has been accepted for the XXV ISPRS Congress 2026. See you in Toronto!
- 2025.12: 🎉 Our paper, led by my senior labmate Dr. Weihua Huan and co-authored with Prof. Wei Huang at Tongji University, was accepted by GIScience & Remote Sensing; honored to contribute as second author and big congratulations to Dr. Huan!
- 2025.10: 🔭 Joined Prof. Yu Liu and Prof. Fan Zhang’s team at Peking University as a remote research assistant.
- 2025.08: 🎉 Delivered an oral presentation at Hong Kong Polytechnic University after our paper was accepted to the Global Smart Cities Summit cum The 4th International Conference on Urban Informatics (GSCS & ICUI 2025).
- 2025.07: 🎉 My undergraduate thesis was accepted by 7th Asia Conference on Machine Learning and Computing (ACMLC 2025).
- 2025.06: 🎓 Graduated from Tongji University—grateful for the journey and excited to continue my studies at CMU.
- 2025.04: 🔭 Completed the SITP project under the supervision of Prof. Yujia Zhai in the College of Architecture and Urban Planning.
- 2025.01: 💼 Joined Shanghai Artificial Intelligence Laboratory as an AI Product Manager Intern.
- 2024.09: 🌏 Conducted research at ASTAR in Singapore under the supervision of Dr. Yicheng Zhang and Dr. Sheng Zhang.
- 2024.04: 🔭 Began my academic journey at Prof. Wei Huang’s lab in the College of Surveying and Geo-Informatics, Tongji University.
📖 Education
💼 Experience
🔭 Research Experience
- 2025.10 - 2026.04, Research Assistant, Institute of Remote Sensing and Geographic Information System, Peking University, China
- 2024.09 - 2024.12, Research Officer, A*STAR Institute for Infocomm Research, Singapore
- 2024.04 - 2025.04, Research Assistant, College of Architecture and Urban Planning, Tongji University, China
- 2024.04 - 2024.12, Research Assistant, College of Surveying and Geo-Informatics, Tongji University, China
💻 Professional Experience
- 2025.01 - 2025.04, AI Product Manager, Shanghai Artificial Intelligence Laboratory, China.
- 2023.01 - 2023.02, Data Analyst, Shanghai Qiantan Emerging Industry Research Institute, China.
📝 Publications

UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images
Kaizhen Tan, Fan Zhang
- Accepted at XXV ISPRS Congress 2026
- Leverage street-view imagery and VGGT-based 3D reconstruction to estimate metrically scaled sidewalk widths, build the SV-SideWidth dataset, and fill OpenStreetMap gaps for equitable assessment of pedestrian infrastructure.

Decoding Tourist Perception in Historic Urban Quarters with Multimodal Social Media Data: An AI-Based Framework and Evidence from Shanghai
Kaizhen Tan, Yufan Wu, Yuxuan Liu, Haoran Zeng
- Under Review at Computational Urban Science
- Developed an AI-powered multimodal framework to analyze tourist perception in historic Shanghai quarters, integrating image segmentation, color theme analysis, and sentiment mining for heritage-informed urban planning.

Multimodal Deep Learning for Modeling Air Traffic Controllers Command Lifecycle and Workload Prediction in Terminal Airspace
Kaizhen Tan
- Published in 7th Asia Conference on Machine Learning and Computing (ACMLC 2025)
- Designed a multimodal deep learning framework linking ATCO voice commands with aircraft trajectories to model workload dynamics, enabling intelligent command generation and scheduling support.

A Spatiotemporal Adaptive Local Search Method for Tracking Congestion Propagation in Dynamic Networks
Weihua Huan, Kaizhen Tan, Xintao Liu, Shoujun Jia, Shijun Lu, Jing Zhang, Wei Huang
- Published in GIScience & Remote Sensing (JCR Q1; IF = 6.9).
- Proposed a spatiotemporal adaptive local search (STALS) method combining dynamic graph learning and spatial analytics to model and mitigate large-scale urban traffic congestion propagation.
🔬 Projects

BlindNav: YOLO+LLM for Real-Time Navigation Assistance for Blind Users
Kaizhen Tan, Yufan Wang, Yixiao Li, Hanzhe Hong, Nicole Lyu
- BlindNav is a real-time, camera-based navigation assistant that uses YOLO for street-scene detection and a local LLM to turn those signals into concise voice guidance for blind and low-vision pedestrians.
💬 Presentations
-
2026.07 - XXV ISPRS Congress 2026
UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images
Toronto, Canada -
2025.08 - Global Smart Cities Summit cum The 4th International Conference on Urban Informatics (GSCS & ICUI 2025)
A Multidimensional AI-powered Framework for Analyzing Tourist Perception in Historic Urban Quarters: A Case Study in Shanghai
Hong Kong Polytechnic University (PolyU), Hong Kong SAR, China -
2025.07 - 7th Asia Conference on Machine Learning and Computing (ACMLC 2025)
Multimodal Deep Learning for Modeling Air Traffic Controllers Command Lifecycle and Workload Prediction in Terminal Airspace
Hong Kong SAR, China