In this video we see the results of a 5 year VarCity research project at the Computer Vision Lab, ETH Zurich. The aim of the project was to automatically generate 3D city models from photos such as those openly available online via social media.
The VarCity system uses computer algorithms to analyse and stitch together overlapping photographs. Point clouds are then created on the basis of overlapping points and then used to generate a geometric mesh or surface model. Other algorithms are used to identify and tag different types of urban objects like streets, buildings, roofs, windows and doors. These semantic labels can then be used to query the model to automatically determine meaningful information about buildings and streets as the video describes. In this way the VarCity project demonstrates one way in which comprehensive 3D city models could effectively be crowd sourced over time.
It is also interesting that VarCity is using computer vision to connect real-time video feeds or content from social media to actual locations. This is used to determine local vehicle and pedestrian traffic. As the video suggests, there may be limitations to this method for determining urban dynamics across the city as it is dependent of accessibility of a suitably large number of camera feeds. This also has implications for privacy and surveillance. The VarCity team address this by showing representative simulated views that replace actual scenes. As such the 3D modelling of urban regions can no longer be viewed as a neutral and purely technical enterprise.
The wider project covers four main areas of research:
- Automatic city-scale 3D reconstruction
- Automatic semantic understanding of the 3D city
- Automatic analysis of dynamics within the city
- Automatic multimedia production
A fuller breakdown of the VarCity project can be viewed in the video below.
The work on automatic 3D reconstruction is particularly interesting. A major difficulty with the creation of 3D city models has been the amount of manual effort they require to create and update through traditional 3D modelling workflows. One solution has been to procedurally generate such models using software such as ESRI’s CityEngine. With CityEngine preset rules are used randomly determine the values for parameters like the height of buildings, the pitch of the roofs, the types of walls and doors. This is a great technique for generating fictional cities for movies and video games. However, this has never been fully successful for the modelling of actually existing urban environments. This is because the outputs of procedurally generated models are only as good as the inputs, both the complexity of the rules used for generating the geometry, but also the representational accuracy of things like the models for street furniture and textures for buildings if they are to be applied.
Procedural generation also involves an element of randomness requiring the application of constraints such as the age of buildings in specific areas which determines which types of street furniture and textures should be applied. Newer districts may be more likely to feature concrete and glass whereas much older districts will likely consist of buildings made of brick. The more homogeneous an area is in terms of age and design the more easy it is to procedurally generate, especially if it is laid out in a grid. Even so there is always the need for manual adjustment which takes considerable effort and may involve ground truthing. Using such methods for particularly heterogeneous cities like London are problematic, especially if regular updates are required to capture changes as they occur.
For my own part I’m currently looking at the processing of point cloud information so it will be fascinating to read the VarCity team’s research papers, available here.