6.1. Target architecture
Figure 9: Target architecture for the management of information resources
The majority of the City’s information resources are stored in databases linked to source systems. In the target architecture, the source systems are the operative systems of City divisions and municipally owned companies, and the systems of external operators, such as the Digital and Population Data Services Agency, Statistics Finland and the Helsinki Regional Transport Authority (HSL). In accordance with the City’s API policies, the connections between source systems and the City’s data platform will be primarily API-based, and every procured system must include APIs so that data that has already been collected and data to be collected can be extensively utilised. External data will be imported into the data platform using methods offered by external operators. If an operator’s data transfer methods do not comply with the requirements set for modern data processing, the City must demand that the operator develop them to a usable level.
If necessary, raw data can also be transferred directly from source systems to the data platform for the purpose of data exploration and analytical modelling. These types of methods are typically used in data science (such as the study of neural networks) to find latent connections and dependencies in data without advance knowledge of exactly which data attributes affect the phenomenon being studied.
Once raw data has been transferred to the data platform, it should undergo pre-processing, which includes the correction of errors, the combination of data and possible aggregation. Pre-processing also includes conceptual, logical and physical data modelling to facilitate consistent further processing. The data platform also includes a separate data lake for the processing of sensor data and other data that needs to be processed in real time. The processing of more structured data can be carried out in a data warehouse.
The combination and analytical processing of information resources will be carried out in an analytics environment, which will include powerful tools for the processing and combination of information resources and advance analytics applications, such as the creation of forecasting models. Processed datasets, analyses and models will be published and made available for the City’s internal and external use via APIs to the extent permitted by the associated access rights.