GeoLocation
GeoLocation is a Kotlin library designed to support the identification of geo-locations in a text.
GeoLocation is part of KotlinNLP.
Getting Started
Import with Maven
<dependency>
<groupId>com.kotlinnlp</groupId>
<artifactId>geolocation</artifactId>
<version>0.2.4</version>
</dependency>
Dictionary model
Download a serialized a model of the LocationsDictionary
here.
Note
All the cities of our model are taken from the OpenStreetMap database. If you need to go back to the original locations you can find a mapping between our ids and OSM ones here.
The locations JSON line file
A JSON line file, with a location for each line.
Locations hierarchy
Locations have one of the following types, that are deduced from their ID:
city
(ID: "XXXX?????XXXX")admin_area_1
(ID: "XXXX??XXX0000")admin_area_2
(ID: "XXXXXX0000000")country
(ID: "XXXX000000000")continent
(ID: "X000000000000")region
(ID: "0X00000000000")
("X" stands for a hex digit from 1 to F, "?" stands for "0" or "X")
They are distributed following this hierarchy:
continent region
|___________|
|
country
|______________
| |
| admin_area_2
|______________|
| |
| admin_area_1
|_______|
|
city
Note: as shown in the graph, the admin_area_1
and the admin_area_2
are optional in the hierarchy of a lower level location.
Location properties
Each location is represented by a list with 16 elements, each representing a property:
id
: String. The ID.unlocode
: String, nullable. The United Nations Code for Trade and Transport Locations (UN/LOCODE).iso-a2
: String, nullable. Defined only for type 'country'. The ISO 3166-1 alpha-2 code of the country.sub-type
: String, nullable. Defined only for some locations of type 'country', 'admin_area_2', 'admin_area_1' and 'city'. The sub-type (the type is deduced from theid
).name
: String, nullable. The main name.name translations
: Object, nullable. Defined only for some locations of type 'country', 'admin_area_1' and 'city'. An object containing the name translations associated by ISO 639-1 language code.other names
: nullable. Defined only for some locations of type 'country', 'admin_area_2' and 'admin_area_1'. A list of other names.demonym
: String, nullable. Defined only for some locations of type 'country', 'admin_area_1'. Defined only 'West Bank' and 'Gaza Strip'). The demonym.lat
: Int, null for type 'region'. The latitude coordinate.lon
: Int, null for type 'region'. The longitude coordinate.borders
: List of String, nullable. Defined only for some locations of type 'country'. It contains a list of the adjacent countries ids.is capital
: Boolean, nullable. Defined only for type 'city'. Whether a city is the capital of its country.area
: Int, nullable. Defined only for type 'country'. The area of the country in km^2.population
: Int, nullable. Defined only for some locations of type 'country', 'admin_area_1' and 'city'. The population.languages
: List of String, nullable. Defined only for some locations of type 'country' and 'admin_area_1'. The ISO 639-1 codes of the languages spoken in the location.altDivisions
: List of List, nullable. Defined only for some locations of type 'city'. It is a list of alternative divisions, containing in turntype
(String),name
(String) andlevel
(Int), in this order.
Location ID
A string containing a number in hexadecimal format, composed by 13 digits divided in 6 groups, each representing a level of the locations hierarchy.
An example:
"52A30012F04DD"
5 2 A3 00 12F 04DD
(1) (2) (3) (4) (5) (6)
Groups:
- Continent (1 digit)
- Region (1 digit)
- Country (2 digits)
- Admin Area 2 (2 digits)
- Admin Area 1 (3 digits)
- City (4 digits)
Note: in the example, the country related to that location has ID "52A3000000000" (it is enough to replace the sections of the lower levels with zeros).
Location sub-types
Here is a list of the possible values of the sub-type
property:
- "administrative county"
- "administrative state"
- "administrative zone"
- "automonous region"
- "autonomous city"
- "autonomous commune"
- "autonomous community"
- "autonomous monastic state"
- "autonomous province"
- "autonomous region"
- "autonomous republic"
- "autonomous sector"
- "autonomous territory"
- "canton"
- "capital"
- "capital city"
- "capital district"
- "capital metropolitan city"
- "capital region"
- "capital territory"
- "captial district"
- "centrally administered area"
- "circuit"
- "city"
- "city|municipality|thanh pho"
- "commissiary"
- "commune|municipality"
- "country"
- "county"
- "departamento"
- "department"
- "dependency"
- "district"
- "district|regencies"
- "division"
- "economic prefecture"
- "emirate"
- "federal dependency"
- "federal district"
- "federal republic"
- "federal state"
- "federal territory"
- "governorate"
- "hamlet"
- "highly urbanized city"
- "independent city"
- "independent component city"
- "independent municipality"
- "independent town"
- "indigenous territory"
- "intendancy"
- "metropolis"
- "metropolitan city"
- "municipality"
- "municipality|governarate"
- "municipality|prefecture"
- "national capital area"
- "national district"
- "national territory"
- "neutral city"
- "parish"
- "prefecture"
- "préfecture"
- "province"
- "provincial city"
- "quarter"
- "region"
- "regional council"
- "republic"
- "republican city"
- "sovereign state"
- "special city"
- "special district"
- "special municipality"
- "special region"
- "special self-governing city"
- "state"
- "statistical region"
- "suburb"
- "territorial unit"
- "territory"
- "town"
- "township"
- "union territory"
- "unitary authority"
- "urban county"
- "urban prefecture"
- "usa territory"
- "village"
- "voivodeship|province"
License
This software is released under the terms of the Mozilla Public License, v. 2.0
Contributions
We greatly appreciate any bug reports and contributions, which can be made by filing an issue or making a pull request through the github page.