Category Archives: Algorithms

VarCity: 3D and Semantic Urban Modelling from Images

In this video we see the results of a 5 year VarCity research project at the Computer Vision Lab, ETH Zurich. The aim of the project was to automatically generate 3D city models from photos such as those openly available online via social media.

The VarCity system uses computer algorithms to analyse and stitch together overlapping photographs. Point clouds are then created on the basis of overlapping points and then used to generate a geometric mesh or surface model. Other algorithms are used to identify and tag different types of urban objects like streets, buildings, roofs, windows and doors. These semantic labels can then be used to query the model to automatically determine meaningful information about buildings and streets as the video describes. In this way the VarCity project demonstrates one way in which comprehensive 3D city models could effectively be crowd sourced over time.

It is also interesting that VarCity is using computer vision to connect real-time video feeds or content from social media to actual locations. This is used to determine local vehicle and pedestrian traffic. As the video suggests, there may be limitations to this method for determining urban dynamics across the city as it is dependent of accessibility of a suitably large number of camera feeds. This also has implications for privacy and surveillance. The VarCity team address this by showing representative simulated views that replace actual scenes. As such the 3D modelling of urban regions can no longer be viewed as a neutral and purely technical enterprise.

The wider project covers four main areas of research:

  • Automatic city-scale 3D reconstruction
  • Automatic semantic understanding of the 3D city
  • Automatic analysis of dynamics within the city
  • Automatic multimedia production

A fuller breakdown of the VarCity project can be viewed in the video below.

The work on automatic 3D reconstruction is particularly interesting. A major difficulty with the creation of 3D city models has been the amount of manual effort they require to create and update through traditional 3D modelling workflows. One solution has been to procedurally generate such models using software such as ESRI’s CityEngine. With CityEngine preset rules are used randomly determine the values for parameters like the height of buildings, the pitch of the roofs, the types of walls and doors. This is a great technique for generating fictional cities for movies and video games. However, this has never been fully successful for the modelling of actually existing urban environments. This is because the outputs of procedurally generated models are only as good as the inputs, both the complexity of the rules used for generating the geometry, but also the representational accuracy of things like the models for street furniture and textures for buildings if they are to be applied.

Procedural generation also involves an element of randomness requiring the application of constraints such as the age of buildings in specific areas which determines which types of street furniture and textures should be applied. Newer districts may be more likely to feature concrete and glass whereas much older districts will likely consist of buildings made of brick. The more homogeneous an area is in terms of age and design the more easy it is to procedurally generate, especially if it is laid out in a grid. Even so there is always the need for manual adjustment which takes considerable effort and may involve ground truthing. Using such methods for particularly heterogeneous cities like London are problematic, especially if regular updates are required to capture changes as they occur.

For my own part I’m currently looking at the processing of point cloud information so it will be fascinating to read the VarCity team’s research papers, available here.

3D Imagery in Google Earth

Since 2006 Google Earth has included textured 3D building models for urban areas. Initially these were crowd-sourced from enthusiastic members of the user community who modeled them by hand with the aid of SketchUp, sold to Timble in 2012, or the simpler Google Building Maker, retired in 2013. As the video above shows, from 2012 onward Google have instead been using aerial imagery captured at a 45 degree angle and employing photogrammetry to automate the generation of 3D building and landscape models.In the following video from the Nat and Friends YouTube channel Google employees help explain the process.

As explained Google Earth’s digital representation of the world is created with the aid of different types of imagery. For the global view 2D satellite imagery is captured from above and wrapped around Google Earth’s virtual globe. The 3D data that appears when users zoom in to the globe is captured via aircraft.

Each aircraft has five different cameras. One faces directly downward while the others are aimed to the front, back, left and right of the plane at a 45 degree angle. By flying in a stripe-like pattern and taking consecutive photos with multiple cameras, the aircraft is able to capture each location it passes from multiple directions. However, the need to obtain cloud free images means that multiple flights have to be taken, entailing that the images captured for any single location may be taken at different times days apart. The captured imagery is colour corrected to account for different lighting conditions, and the images for some areas even have finer details like cars removed.

The process of photogrammetry as employed by Google works by combining the different images of a location and generating a 3D geometric surface mesh. Computer vision techniques are used to identify common features within the different images so that they can be aligned. A GPS receiver on the aircraft also records the position from which each photograph was taken enabling the calculation of the distance between the camera on the plan and any given feature within the photograph. This facilitates the creation of depth maps which can be stitched together using the common features identified earlier to form a combined geometric surface mesh. The process is completed by texturing the mesh with the original aerial imagery. For regularly shaped objects like buildings this can be done very accurately with the aid of edge detection algorithms that can identify the edges of buildings in the imagery and help align them with the edges of features in the mesh. For organic structures this is more challenging.

Google Earth includes imagery for may different levels of detail or zoom. According to the video the number of images required is staggering, in the order of the tens of millions. While the zoomed out global view in Google Earth is only fully updated once every few years the aerial imagery for particular urban areas may be updated in less than a year. Gathered over time this imagery can enable users to observe changes and this can be leveraged for analysis with the aid of Google’s Earth Engine.