Preservation of urban heritage is one of the main challenges for contemporary society. It’s closely connected with several dimensions: global-local rhetoric, cultural tourism, armed conflicts, immigration, cultural changes, investment flows, infrastructures development and etc . Nowadays very often organizations responsible for heritage management constantly have to deal with lack of resources, which are crucial for proper heritage preservation, maintaining and protection. Particularly it is problematic for countries with low GDP or unstable political situation.
The possible solution of these problems could be automated heritage monitoring software system, based on the 3D data and AI technologies, which increase monitoring efficiency (financial, timewise, and data objectiveness factors). The system prototype was developed and tested by Vilnius University and Terra Modus Ltd. in frame of project “Creation of automated urban heritage monitoring software prototype” (2014). Next step is creation of full-capability software which is under development by Vilnius University on framework of project “Automated urban heritage monitoring implementing 3D and AI technologies”. Project financed by Research Council of Lithuania (project time 2018-2022) . At this paper only general pipeline of the 1st stage of project is presented.
Proposed digital monitoring technique is based on effective reality capture and comparisons of data in time. 3D laser scanners and digital photogrammetry are the most capable, accurate enough data collection methods. Collected information from different time period measurements could serve as data for artificial intelligence analysis, which can automatically identify needed valuable elements and its changes during the particular time period. Such monitoring can possibly be performed in a remote, non-destructive, and cost-effective way . Accordingly, main principles of suggested solution are listed below.
Digital monitoring is based on seven conditions. First: all objects in the monitoring process are tangible. Second: physical valuables could be expressed as simple geometrical forms or mathematical expression. Third: monitored objects could be fully scanned of photogrammetrically processed. Fourth: data from Lidar devices and data derived from photogrammetry are same quality (density, coverage, etc.). Fifth: detection of cultural heritage could be analysed by static and machine learning algorithms. Sixth: digitally processed results should be able to be checked. Seventh: digital monitoring is based on non-destructive and non-invasive 3D view technologies and analytical technologies.
Regarding of digital data there are two possible ways to perform detection and comparison of selected valuables. First case scenario mainly means lack of comparable data of the older status quo. This means that there are no earlier 3D data of selected cultural heritage. Newly collected data is compared with mathematical rules which can be written in coded form. These set of rules describes geometrical parameters of selected valuables of the cultural heritage. In the second case scenario there are two data sets from different time period. This data is compared with each other. In both cases comparison needs interpretation.
The first level of interpretation is in demonstrating some facts of geometrical change. The second level depends on the particular legal status and local legislation for managing cultural heritage, e.g. meaning of detected changes depends on legislation). First level of interpretation could be evaluated by logical operators, for example alteration is described as “status quo unchanged”, “reduction in volume by 65%”, etc. Second level of interpretation could be legal analysis of first level results, for example, “reduction in volume = fact of illegal demolition works”.
According to the most frequent alteration of the Vilnius Old Town’s buildings’ valuables, a list could be stated: a) elements of the roof; b) shapes of the roof; c) cornices; d) doors; e) gates; f) the primary height and width of height buildings; g) the primary housing intensity of site; h) windows; i) chimneys. These are main valuables which can be traced in the manner of geometrical changes.
In order to perform the detection of valuables, we first need to train the AI algorithms to identify the desirable valuables from the data – 2D pictures or 3D point clouds. Google “Tensorflow” with DeepLab v. 3+ with default settings was used .
These are semantical segmentation procedures where some already annotated and trained data could be used. However, there are very little open data quality content for such topic. Hence, for performing the digital monitoring processes, a new database was established. Concerning future software’s usage for different oldtowns of Europe, only database with additional 2D pictures of elements or 3D scans are needed.
The newly established database consists of collected pictures from the main streets of the Vilnius’ Old Town. For data annotation, Labelbox is used. Currently there are 420 high-resolution photos (12 megapixels) where the first two classes (valuables) are created: windows and doors. All doors and windows are manually annotated in 420 photos. Annotations were performed so that an algorithm could identify the kind of pixels that denote windows as well as what pixels stand for doors. For performing the training task, the currently most powerful open data algorithms of Google’s Tensorflow were used. In this case, an XML file is the result of annotation. This means that the annotated information in the c++ language is described according to the standard of Pascal VOC. This standard is one of the most popular and widely used. To sum up, two types of files are exported from Labelbox: XML and JPG. The further process could be described as follows:
1. JPG and XML are converted into RGB. The results are PNG files with segmentation masks – SegmentationClass;
2. Additionally, some PNG raw files with a semantical segmentation object contour are exported – SegmentationClassRaw;
3. JPG, PNG files (SegmentationClass) and PNG files (SegmentationClassRaw) are manually separated into two parts: “Train” (for training) and “Val” (for validation). The Train part is also automatically separated into tech and test parts in order to identify how accurate the training results are compared with human manual annotation. Hence, some extra Train, Val and Train/Val index are generated;
4. According to an index of JPEG, PNG, and PNG (Raw) files, we generated files special formats that were required by Tensorflow training – TFRecord (Train, Val, and TrainVal);
5. The system is trained using TFRecord files. In order to get the most accurate results, many hyper parameters should be optimized. This process is analysed in detail by J. Bergstra and Y. Bengio .
One of the biggest problems with hyper parameter optimization is overfitting. In the context of heritage monitoring, this would cause that newly presented valuables – windows, for example – could not be identified properly. In order to avoid overfitting, various techniques could be applied, e.g. early stopping. Once the progress shows that mistakes stopped reducing, all processes are then being stopped. That calculation of the quality of prediction is described as loss function. There are various methods on how to calculate the loss function, but in this experiment, a default “cross entropy” is used. The experiment results demonstrated that training progress was performed properly because the loss function was gradually decreasing and data were not overfitted. However, a powerful computer resources are needed for finalizing the whole experiment with all groups of valuables.
To sum up, presented project is still at an early stage; however, the results of the first laboratory experiments with the primary version of the pooled data resource achieving 80% accuracy in semantic segmentation of objects into two classes (windows and doors) suggest that the chosen technology solutions and developed methodology will be adapted successfully to achieve project objectives.