Indonesia Capital Cities Project
By Dr. Karen Payne, ITOS Humanitarian Program
This dataset was constructed by GISCorps volunteers in partnership with the University of Georgia, Information Technology Outreach Services (ITOS). The motivation for this project came from users of the GIST data repository at ITOS, who contacted the repository administrator and asked if they were aware of a dataset showing the capital cities of Indonesia. Two types of users made this request: those from US Federal Government agencies and those from academic institutions. After researching the request, it became clear that an easily accessible point data set of the current capital cities in Indonesia did not exist. Because ITOS has a mandate to proactively create primary datasets used by the humanitarian community, we decided to create it and distribute it through our repository. As a result, ITOS contacted the GISCorps and asked for volunteers that could help find the latitude and longitude for all Province Capitals, Regency Capitals, and autonomous cities (Kotas) in Indonesia. Three volunteers were recruited and did a fantastic job creating the dataset: Paula Dillon (Idaho), Daisy Harsa (Texas) and Gary Hunter (Australia).
Administratively, Indonesia is governed at the following levels:
- Admin0 = Indonesia
- Admin1 = Region
- Admin2 = Province. Every Province has a Provincial Capital.
- Admin3 = Regency (kabupaten, sometimes translated as “district”) and Autonomous Cities (Kota, sometimes translated as “municipality”). Both Regency and Kota are considered the same administrative level (admin 3) but while a Kota is spatially located within a Regency, they have separate governments. Sometimes the capital of a Regency is also a Kota, but more often than not the capital of a Regency is a different city than any of the autonomous cities within the Regency. Similarly, sometimes a Provincial Capital may also be a Kota, but more often than not the Provincial Capital is a different city than any of the autonomous cities within the Province.
- Admin4= Subdistricts (Kecamatan) – These are polygons that incorporate many villages which we did not map in this exercise.
A target list of cities to locate was generated from a set of Wikipedia pages as they existed on 4 July 2013, primarily http://en.wikipedia.org/wiki/List_of_regencies_and_cities_of_Indonesia, http://en.wikipedia.org/wiki/Provinces_of_Indonesia, Embassy of the Republic of Indonesia, and linked pages. After compiling the target list, a range of sources were searched for geocodes, including:
- Geonames Server http://geonames.nga.mil/ggmagaz/
- OSM http://www.openstreetmap.org/ (you can get the Lat/Long from the permalink url or download and parse the xml, or in the edit tab choose “Browse Map Data,” select a node and choose “Details” to get the coordinates and check the other admin boundaries).
- Foursquare https://foursquare.com/
- Maps and links on these Wikipedia pages will help you make sure you are in the correct general area of the country: http://en.wikipedia.org/wiki/List_of_regencies_and_cities_of_Indonesia
- Ungeoreferenced Topo Series: http://www.lib.utexas.edu/maps/ams/indonesia/
- PDF maps from the government of Indonesia -Here is a google translate link – you can download pdfs. Credit the source as the Center for Research, Promotion and Cooperation Bureau of Geospatial Information (BIG), Government of Indonesia:
Volunteers also used contextual data to help learn the geography of the area, including:
- vector data for the country (roads, rivers, islands) www.lemigas.esdm.go.id
When possible multiple sources were used to confirm locations. After capturing geocodes, the data was compared to several administrative unit databases (GAUL, GADM, SALB, and http://www.iscgm.org) to make sure the cities were located within the correct administrative boundary. ISCGM had the best available administrative boundaries for Indonesia.
The task of locating these cities can be complicated by the following factors:
- Often, the city name appeared differently but with the same letters, in different sources. For example, “Pangkalan Balai” may be written as “Pangkalanbalai,” or different vowels may be used (an ”a” instead of an ”e” for example).
- Abbreviations are also possible. For example “Tanjung Balai“ is often abbreviated “Tg. Balai.”
- The same name is often used in more than one area of Indonesia (and other countries) so each city was checked to make sure it was in the right Province, Regency, or Region of the right country. For example, Simpang Empat is the capital of West Pasaman Regency in Sumatra (99.817 E, 0.083 N), but it is also the name of multiple other villages in Indonesia and Malaysia.
- Because Indonesia crosses the equator volunteers were vigilant about negative and positive values for latitude.
- Many capitals have the same name as districts (Kecamatan).
- In some cases, the geocode of a city was actually the centroid of the administrative polygon it governed. This became evident when looking at the location in imagery, and the imagery did not indicate it was a population center and was outside of a built-up area. In this case an alternate source was used to find the city.
- Similarly, some sources truncate the latitude and longitude, only showing degrees without minutes and second, placing the location of the city far away from where it should be. Again, in this case alternate sources were used to better locate a city.
A staff member at ITOS conducted quality assurance on the geocoded dataset by:
- Double checking the target city list against the Wikipedia source pages to make sure no cities were missed or were duplicates
- Locating any cities the volunteers were unable to find or cities that the volunteers felt their location was questionable
- Compared the location of the cities to the best available administrative unit data (from ISCGM) to make sure the cities were inside their respective administrative unit.
Collectively, the three volunteers (excluding ITOS staff) contributed 89 hours to the project. The vast majority of the work was conducted between June and August 2013. Volunteers located 547 cities; with 9 cites not located and later geocoded by ITOS staff. The volunteers spent the majority of their time geocoding, as shown figure 1 showing the breakdown of volunteer hours by task.
Volunteers located 547 cities; with 9 cites not located and later geocoded by ITOS staff. The volunteers spent the majority of their time geocoding, as shown figure 1 showing the breakdown of volunteer hours by task:
Figure 1. Volunteer hours by task
The source of the geocode was recorded in the spreadsheet, and the use of multiple sources was encouraged. Of the 547 originally located cities, 126 were located with a single source, 356 were located twice (using 2 sources), and 65 were located with three sources. The final number of cities in the dataset was reduced to 493 after removing duplicates and non-capital cities that should not have been part of the target list.
Figure 2. Frequency of sources used to locate cities
The resulting database and ISO 19139 metadata was published at the GIST Data Repository 07Oct2013 and is accessible to the public (a free user account is required): https://gistdata.itos.uga.edu/node/59667. Notifications of the availability of the dataset were also sent to the Crisis Mappers listserv and to GIST Repository users.
Update: 29June2016 The GIST data Repository has be deprecated and this dataset is now available on the HDX site: https://data.humdata.org/
Figures 3-4 show the distribution of cities by region and type, and figure 5 is a map of the final product.
Figure 3. Number of capital cities, by type in each region of Indonesia.
Figure 4. Count of the number of capital cities by type.
Figure 5. Capital cities of Indonesia