November 29, 2015 GEO mapnik PostgreSQL/PostGIS Python SQLite/Spatialite
In this post I describe how I convert and import OpenStreetMap data to a SQLite/Spatialite database. The data is prepared such that it ready to be used with mapnik and the OSMBright style. This approach is chosen for a customer and is used in production since 2013. The OSM data is periodically refreshed, and the accompanying tile server periodically refreshes tiles which include changed data.
First of all, why would you import it to a SQLite database and not just straight into a PostgreSQL/PostGIS database? One of the answers here is simplicity. Setting up PostgreSQL isn’t that hard, but requires knowledge about how to set it up and it requires a machine to actually run PostgreSQL on.
Another answer is easy maintenance. You can simply copy the SQLite database to another machine and you have it there as well. No need to run the actual import of the OSM data again, or export and import the PostgreSQL database itself. Simply copying the SQLite database is sufficient.
Also, using SQLite it is very easy to compare two databases using the ATTACH command. If you have an old and a new SQLite database, you can simply run a query comparing the two. For example, you can compare existing records by running a query with an INNER JOIN against the same table in the old database and the new database, matching on OSM ID. If the OSM version is changed, the record is updated. Using PostgreSQL, it is harder to compare tables in two different databases. Note that what we want to achieve, keeping track of changed tiles, is usually done by importer tools such as osm2pgsql.
Another possibility, but not actually verified/bench marked, is that querying a SQLite database requires less context switches and be faster as a result. When a client fires a query to a PostgreSQL server, the client connects to PostgreSQL, the client transmits the query, the database executes the query, the database serializes the result and transmits it back to the client, the client deserializes the data. When using SQLite, the database is already ‘in’ the program and there is no need to create a connection, and serialize/deserialize any data.
Note however, that this also has downsides. Some queries which run on mapnik/PostgreSQL cannot be executed on mapnik/SQLite. Also, the OSMBright style is written specifically for the PostgreSQL database, resulting in additional changes to the OSMBright style itself.
To achieve this, the data needs to be converted from OSM PBF file format to SQLite/Spatialite. GDAL/OGR provides tools to do this via the OSM driver and the SQLite/Spatialite driver. The resulting SQLite database is crude and not yet usable for the OSMBright style.
We need to add additional instructions to the OSM Driver, such as adding a osm_version column to the resulting data. This is done through this osmconf.ini file. We need this, for example, to be able to find changed tiles later on, so that these are re-rendered.
The command to convert the OSM data to the SQLite database is as follows:
$ ogr2ogr -f SQLite output.sqlite input.osm.pbf -progress -dsco SPATIALITE=YES -gt 65536 --config OSM_CONFIG_FILE ./osmconf.ini -lco SPATIAL_INDEX=NO
Note that we do not create a spatial index (SPATIAL_INDEX=NO), since the data is transformed using the Imposm mapping later on. Creating an index at this point is a waste of disk space and processing power, since the resulting tables are never used directly.
Next, we convert the ‘raw’ data in the SQLite database to something usable for the OSMBright style. The data is read from the tables created by OGR and transformed and filtered into tables following the Imposm mapping. The used Imposm mapping is included in the OSMBright style, thankfully. Using the mapping, we can generate a SQL script to do the conversion, which can be found here: osm_ogr_to_imposm.sql. After running this script, we end up with several tables, containing the resulting ready-to-be-used data.
OSMBright however, it specifically written towards PostgreSQL. As stated before, some queries written for mapnik/PostgreSQL cannot be executing on mapnik/SQLite. Usually, these queries are easily ‘fixed’ and SQLite executes these without any problems. For example, some queries use this snippet in the WHERE clause: ‘geometry && !bbox!’. Mapnik converts this snippet to into a expressions which filters on a bounding box. Filtering on the bounding box early has a big advantage, since a lot of data can be discarded very early in the process, saving processing power.
The resulting style worked fine, but I wanted to optimize things a bit further. For example, the layer ‘area_label’ contains an elaborate query, using several sub-queries and WHERE-clauses. We can optimize this query by creating a materialized view. Or rather, run the query on the source tables, store the result in another table, and let mapnik use that table instead of the complex query. Note that this will grow the database size on disk, but makes the mapnik style a bit simpler.
More optimization can be found by enabling the debugging/logging facilities provided by mapnik. The executed queries are shown when rendering a map. We can now use the SQLite EXPLAIN feature to investigate bottlenecks.
The refreshing of the resulting tiles is done periodically. A complete new OSM Data-set is downloaded (The Netherlands only), converted using the approach described above. Then, the old and new databases are compared, using the following methods:
- Find newly created records;
- Find deleted records;
- Find changed records.
Step 1 and 2 simply execute a query using a LEFT JOIN against the tables in the new and old databases. Step 3 executes a query using an INNER JOIN against the tables in the new and old databases. If the osm_version was changed, then the record was updated. All the tiles on which this record/geometry can be found on are enqueued to be re-rendered. Note that a record might have been changed in a way unimportant to us (for example, an attribute not being use by the style was changed), but we want to be on the safe side here, so we simply re-render the tile. A fairly simple, but effective method.
In conclusion, this post describes how I created a means of importing OSM data into an SQLite database and render tiles from it using mapnik/the OSMBright style. Furthermore, a method to find tiles to be re-freshed is shown. Hopefully this approach might be usable by others, or might offer new insights.