Category Archives: 3D Virtual City Models

Open3D: Crowd-Sourced Distributed Curation of City Models

Open3D is a project by the Smart Geometry Processing Group in UCL’s Computer Science department. The project aims to provide tools for the crowd-sourcing of large-scale 3D urban models. It achieves this by giving users access to a basic 3D data set and providing an editor enabling them to amend the model and add further detail.

The model that users start with is created by vertically extruding 2D building footprints derived from OpenStreetMap or Ordnance Survey map data. Access to the resulting the 3D model is granted using a viewer based on the Cesium javascript library for rendering virtual globes in a web browser. The interface allows users to select particular buildings to work on. As changes are made to the model with the Open3D editor they are parameterised behind the scenes. This means that each changes become variables in an underlying set of repeatable rules that form templates representing common objects such as different types of window or door. These templates can then be shared between users and reapplied to other similar buildings within the model. This helps facilitate collaboration between multiple users and speeds up model creation.

Crowd-sourcing 3D urban models is not new. As we saw in an earlier post on 3D Imagery in Google Earth, Google’s acquisition of SketchUp in 2006 enabled enthusiasts to model and texture 3D buildings directly from satellite imagery. These models could then be uploaded to the 3D Warehouse where they were curated by Google who choose the best models for inclusion in their platform. Despite the enthusiasm of the user community there were limits to the speed of progress and the amount of coverage that could be achieved. In 2012 Google sold SketchUp to engineeting company Trimble after adopting a more automated process relying on a combination of photogrammetry and computer vision techniques. We recently saw similar techniques being used by ETH Zurich in our last post on their project VarCity.

In this context the Open3D approach which heavily relies on human intervention may seem outdated. However, while the kinds of textured surface models that are created using automated photogrammetry look very good from a distance, closer inspection reveals all sorts of issues. The challenges involved in creating 3D models through photogrammetry include: (i) gaining sufficient coverage of the object; (ii) the need to use images taken at different times in order to achieve sufficient coverage; (iii) having images of sufficient resolution to obtain the required level of detail; (iv) the indiscriminate nature of the captured images in the sense that they include everything within the camera’s field of view, regardless of whether it is intended for inclusion in the final model or not. Without manual editing or further processing this can result in noisy surfaces with hollow, blob-like structures for mobile or poorly defined structures and objects. The unofficial Google Earth Blog has done a great job of documenting such anomalies within the Google platform over the years. These include ghostly images and hollow objects, improbably deep riversdrowned citiesproblems with overhanging trees and buildings and blobby people.

The VarCity project sought to address these issues by developing new algorithms and combining techniques to improve the quality of the surface meshes they generated using aerial photogrammetry. For example, vehicle mounted cameras were used in combination with tourist photographs to provide higher resolution data at street level. In this way the ETH Zurich team were able to improve the level of detail and reduce noise in the building facades considerably. Despite this the results of the VarCity project still have limitations. For example, with regard to their use in first person virtual reality applications it could be argued that a more precisely modeled environment might better support a sense of presence and immersion for the user. While such a data set would be more artificial by virtue of the artifice involved in its production, it would also appear less jarringly course in appearance and feel more seamlessly realistic.

In their own ways both VarCity and Open3D seek to reduce the time and effort required in the production of 3D urban models. VarCity uses a combination of methods and increasingly sophisticated algorithms to help reduce noise in the automated reconstruction of urban environments. Open3D on the other hand starts with a relatively clean data set and provides tools to enhance productivity while leveraging the human intelligence of users and their familiarity with the environment they are modelling to maintain a high level of quality. Hence, while the current output for Open3D may appear quite rudimentary compared to VarCity this would improve through the effort of the systems potential users.

Unlike the VarCity project in which crowd-sourcing was effectively achieved by proxy through the secondary exploitation of tourist photos gathered via social media, Open3D seeks to engage a community of users through direct and voluntary citizen participation. In this regard Open3D has a considerable challenge. In order to work the project needs to find an enthusiastic user group and engage them by providing highly accessible and enjoyable tools and features that lower the bar to participation. To that end the Open3D team are collaborating with UCL’s Interaction Group (UCLIC) who will be focused on usability testing and adding new features. There is definitely an appetite for online creation of 3D which is evident in the success of new platforms like Sketchfab. Whether there is still sufficient enthusiasm for the bottom-up creation of 3D urban data sets without the influence of a brand like Google remains to be seen.

For more information on Open3D check out the Smart Geometry Processing Group page or have a look at the accompanying paper here.

VarCity: 3D and Semantic Urban Modelling from Images

In this video we see the results of a 5 year VarCity research project at the Computer Vision Lab, ETH Zurich. The aim of the project was to automatically generate 3D city models from photos such as those openly available online via social media.

The VarCity system uses computer algorithms to analyse and stitch together overlapping photographs. Point clouds are then created on the basis of overlapping points and then used to generate a geometric mesh or surface model. Other algorithms are used to identify and tag different types of urban objects like streets, buildings, roofs, windows and doors. These semantic labels can then be used to query the model to automatically determine meaningful information about buildings and streets as the video describes. In this way the VarCity project demonstrates one way in which comprehensive 3D city models could effectively be crowd sourced over time.

It is also interesting that VarCity is using computer vision to connect real-time video feeds or content from social media to actual locations. This is used to determine local vehicle and pedestrian traffic. As the video suggests, there may be limitations to this method for determining urban dynamics across the city as it is dependent of accessibility of a suitably large number of camera feeds. This also has implications for privacy and surveillance. The VarCity team address this by showing representative simulated views that replace actual scenes. As such the 3D modelling of urban regions can no longer be viewed as a neutral and purely technical enterprise.

The wider project covers four main areas of research:

  • Automatic city-scale 3D reconstruction
  • Automatic semantic understanding of the 3D city
  • Automatic analysis of dynamics within the city
  • Automatic multimedia production

A fuller breakdown of the VarCity project can be viewed in the video below.

The work on automatic 3D reconstruction is particularly interesting. A major difficulty with the creation of 3D city models has been the amount of manual effort they require to create and update through traditional 3D modelling workflows. One solution has been to procedurally generate such models using software such as ESRI’s CityEngine. With CityEngine preset rules are used randomly determine the values for parameters like the height of buildings, the pitch of the roofs, the types of walls and doors. This is a great technique for generating fictional cities for movies and video games. However, this has never been fully successful for the modelling of actually existing urban environments. This is because the outputs of procedurally generated models are only as good as the inputs, both the complexity of the rules used for generating the geometry, but also the representational accuracy of things like the models for street furniture and textures for buildings if they are to be applied.

Procedural generation also involves an element of randomness requiring the application of constraints such as the age of buildings in specific areas which determines which types of street furniture and textures should be applied. Newer districts may be more likely to feature concrete and glass whereas much older districts will likely consist of buildings made of brick. The more homogeneous an area is in terms of age and design the more easy it is to procedurally generate, especially if it is laid out in a grid. Even so there is always the need for manual adjustment which takes considerable effort and may involve ground truthing. Using such methods for particularly heterogeneous cities like London are problematic, especially if regular updates are required to capture changes as they occur.

For my own part I’m currently looking at the processing of point cloud information so it will be fascinating to read the VarCity team’s research papers, available here.

The Art & Science of 3D Cities at the Transport Systems Catapult

Back in March I attended a day long workshop the at the Transport Systems Catapult (TCS) in Milton Keynes on the subject of ‘The Barriers to Building 3D Synthetic Environments’. The aim of the workshop was to bring together key SMEs and Academics to collaboratively identify challenges and discuss solutions for the creation of virtual environments that would be suitable for simulating and testing transport scenarios.

Alongside presentations from the Transport Systems, Future Cities and Satellite Applications catapults a number of SMEs also presented on topics as diverse as LiDAR data capture, GNSS positioning, 3D GIS and the use of GIS data in game engines. For my purposes the following talk on ‘The Art & Science of 3D Cities’ by Elliot Hartley of Garsdale Design was particularly interesting and raised a number of great points:

One of the key challenges for the generation and use of 3D data discussed by Elliot derives from the heightened expectation generated by the depiction of 3D urban environments in films, video games and Google Earth. The truth is the creation of these kinds of environments require considerable investment in terms of time and investment. Elliot’s talk poses key questions for stakeholders when embarking on a 3D project:

  • Why do you want a 3D model?
  • Do you actually need a 3D model?
  • What kind of 3D model do you want?
  • What 3D model do you actually need?
    • Small areas with lots of detail?
    • Large areas with little detail?
  • How much time and/or money do you have?
  • Will you want to publish the model?
  • What hardware and software do you have?
  • What’s the consequence of getting the model wrong?

While the primary focuses for the day were the practical and technical challenges of creating 3D environments, the further implication of Elliot’s discussion is that the use of 3D data and the creation of virtual environments can no longer be considered a purely technical activity with neutral products and outputs. For me the last question in particular foregrounded the stakes involved in moving beyond visualisation toward the growing use of 3D data in various forms of analysis. Thanks to Elliot for the stimulating talk.

After the presentations we had a tour of the TCS facilities and then broke up into work groups to discuss a number of themes. A report and summary is expected to be published by the TCS soon.

A Brief History of Google Maps…and a not so Brief Video

In this long but useful presentation from 2012 Google Maps vice president Brian McClendon and colleages provide a detailed overview of the platforms evolution. Some of the key points are summarised below.

In the mid 90s Silicon Graphics developed the ‘Space-to-Your-Face’ demo to demonstrate the power of their Onyx Infinite Reality CGI workstation. In the demo the view zooms from orbit to the Matterhorn via Lake Geneva, using a combination of satellite, aerial imagery and terrain data. This is included in the Silicon Graphics showreel from 1996 which be viewed on YouTube here.

In 2001 the company Keyhole was founded as a startup providing mapping for the travel and real estate industries on the basis of a subscription model. After achieving wider recognition through use by CNN during the invasion of Iraq in 2003, the company was subsequently acquired by Google in 2004.

At the same time Google were working on the creation of Google Maps which used a combination of client side processing via AJAX and pre-rendered map tiles to enable its highly interactive and smooth scrolling slippy map system. However, now that network bandwidth and processing power has been increased Google Map tiles are no longer pre-rendered and are instead provided on demand.

Between 2005 and 2008 Google Maps licensed further data to obtain a full world map with more comprehensive coverage. At the same time Google were also working to acquire high resolution imagery.

Street View started in five US cities in 2007 but had expanded to 3000 cities in 39 countries by 2012. In 2008 Google released Map Maker to capture data where other basic mapping data and Street View were absent.

Google’s Ground Truth project now enables them to generate their own maps from raw data by combining satellite and aerial imagery with road data and information capture via Street View. This data is processed with an application callled ‘Atlas’ that Google developed internally. With the aid of advanced computer vision techniques they are able to detect and correct errors and extract further contextual information from the raw imagery data that helps them make their maps more complete and accurate. This includes details as specific as the names of streets and businesses appearing on signs.

Corrections are also crowd-sourced from users with the aid of their ‘Report Maps Issue’ feature. Staff at Google are then able to verify the issue with Street View, edit the map and publish the corrections within minutes.

The presentation moves on to further discussions on ‘Google Maps For Good’ and their work with NGOs (19:20), ‘Google Maps for Mobile’ and the provision of offline map availability (27:35), the evolution of the equipment used to capture Street View (31:30), and finally the evolution of their 3D technology (37:40). The final discussion in particular reiterates the content in my post yesterday from a slightly different perspective.

What I found particularly interesting in this video was the continued manual intervention via Atlas but also the extent to which they are able to gather contextual information from Street View imagery.

3D Imagery in Google Earth

Since 2006 Google Earth has included textured 3D building models for urban areas. Initially these were crowd-sourced from enthusiastic members of the user community who modeled them by hand with the aid of SketchUp, sold to Timble in 2012, or the simpler Google Building Maker, retired in 2013. As the video above shows, from 2012 onward Google have instead been using aerial imagery captured at a 45 degree angle and employing photogrammetry to automate the generation of 3D building and landscape models.In the following video from the Nat and Friends YouTube channel Google employees help explain the process.

As explained Google Earth’s digital representation of the world is created with the aid of different types of imagery. For the global view 2D satellite imagery is captured from above and wrapped around Google Earth’s virtual globe. The 3D data that appears when users zoom in to the globe is captured via aircraft.

Each aircraft has five different cameras. One faces directly downward while the others are aimed to the front, back, left and right of the plane at a 45 degree angle. By flying in a stripe-like pattern and taking consecutive photos with multiple cameras, the aircraft is able to capture each location it passes from multiple directions. However, the need to obtain cloud free images means that multiple flights have to be taken, entailing that the images captured for any single location may be taken at different times days apart. The captured imagery is colour corrected to account for different lighting conditions, and the images for some areas even have finer details like cars removed.

The process of photogrammetry as employed by Google works by combining the different images of a location and generating a 3D geometric surface mesh. Computer vision techniques are used to identify common features within the different images so that they can be aligned. A GPS receiver on the aircraft also records the position from which each photograph was taken enabling the calculation of the distance between the camera on the plan and any given feature within the photograph. This facilitates the creation of depth maps which can be stitched together using the common features identified earlier to form a combined geometric surface mesh. The process is completed by texturing the mesh with the original aerial imagery. For regularly shaped objects like buildings this can be done very accurately with the aid of edge detection algorithms that can identify the edges of buildings in the imagery and help align them with the edges of features in the mesh. For organic structures this is more challenging.

Google Earth includes imagery for may different levels of detail or zoom. According to the video the number of images required is staggering, in the order of the tens of millions. While the zoomed out global view in Google Earth is only fully updated once every few years the aerial imagery for particular urban areas may be updated in less than a year. Gathered over time this imagery can enable users to observe changes and this can be leveraged for analysis with the aid of Google’s Earth Engine.