Volunteers assisting in data cleansing of health facilities in Libya
At the request of the World Health Organization (WHO),GISCorps volunteers were deployed to a data cleansing project in Libya. The dataset included the locations of health facilities around the country. The information was collected from various resources and by volunteers from two other partner organizations: the Stand By Task Force (SBTF) and the Humanitarian OpenStreetMap (HOT). GISCorps volunteers working on this project were: Abdel-Rahman Muhsen from Calgary and Case Robertson from Georgia.
The objective of this mission was to obtain a list of geo-coded health facilities as they existed in pre-revolution Libya and then hand over that information to the local teams to verify their existence and accuracy. In order to accomplish that, WHO asked that SBTF and HOT volunteers search the Internet for those facilities and log the information in a Google spreadsheet; the file was shared with their members. GISCorps was then asked to provide volunteers to conduct a Quality Control (QC) on the collected data.
When GISCorps volunteers joined the team, SBTF and HOT volunteers had already complied over 600 records in the spreadsheet (more records were added later). Therefore, the first step for our volunteers was to examine the file and devise a QC plan (the plan evolved during the mission). The plan identified the following steps:
1. Standardizing existing fields and adding new fields: after observing the file, it became clear that the content of a few fields needed standardization; also, addition of several new fields was suggested (some by GISCorps and some by other volunteers). Consequently, two existing fields were identified as candidates for developing a “validation” list. Those were the type of Health Facility (hospital, clinic, etc.) and the City/Shabiya (this field became the City/Village field later on) fields. At the end, seven new fields were added to the file: Unique HF ID (critical field), Sector Type, Record Status, Date of Data Collection, Link to the Source, New Shabiya, and GC_Comment. A validation list was created for the Sector Type field as well. Our suggestions for developing new validated lists and also for additional fields were applied to the test file and upon the review of other collaborators, was applied to the live spreadsheet (at this point, a few suggestions came in for additional fields which were also added to the file; they are listed above).
Google Spreadsheet of Health Facilities in Libya.
2. Applying the new standards to the live file: after receiving the approval, a script was written to extract information about the collected health facilities and accordingly populate certain fields with standardized values. In particular, the script populated the Facility Type field (possible values: Hospital, Clinic, Medical Center, etc), in addition to Sector Type field (possible values: Public, Private, and Other). The same was applied to the City/Shabiya field, however, later on it was discovered that the city list which was obtained from OpenStreetMap, had been revised and the validation list had to be replaced and the script applied again.
3. GIS based QC: after applying the standards and adding the new fields, the file was exported and geo-coded. As a result, +/- 360 records were mapped (they contained a geographic coordinate) and using a desktop GIS software, the volunteers divided the country and started the QC process on those records. Several QC codes were developed and volunteers coded each record with those pre-assigned code. Examples of codes were: “DUP” for duplicate, “City = XXX” when the city/village name was inaccurate, “OK” when the record seemed to be accurate, etc. It is important to note that other than standardization related corrections, our volunteers did not conduct any other QC on records that were missing geographic coordinates. During a weekend, all 360 records were QC’d and the results were placed in a new field called GC_Comment. That field was then added to the live file by using the Unique HF ID as the join item. At that point, SBTF and HOT teams were notified that they could examine GISCorps’ feedback and apply the changes when and where appropriate.
Conducting QC on GIS Desktop.
4. Adding Shabiya name to the file: as part of the GIS QC, using a GIS layer for Shabiya (provided by WHO), the name of Shabiya was also added to each mapped record.
After the GISCorps completed the above tasks, SBTF and HOT volunteers were invited back to the project and applied more corrections and additions. HOT also developed a mapping application for online data entry and WHO desires/hopes to make this new tool available to the Libyan citizens so that they can add/insert new data into the mapping system directly (as apposed to typing it in a spreadsheet first and mapping the information later). WHO will need help in informing the local population about this project/effort and they are currently exploring various possibilities.
This was a very interesting and worthwhile project for the GISCorps from many aspects. It was our first project with WHO and also our first collaborative project with SBTF and HOT (we had worked with SBTF and HOT separately in the past). We felt that each team brought in a different perspective and skill set to the project which was highly valuable. The final results are yet to be developed and we are certain that some of the following suggestions are already being considered, however, we are listing a few that we deem valuable:
- develop a database schema (a list of fields) and identify the fields in need of validation list “before” data entry begins. This is especially important for future analysis and data processing. Addition of a unique identification number (read-only) at the beginning of the data entry is very important.
- in general, a comparison between what existed in WHO’s GIS layer(s) before this project started and the final geo-coded file is absolutely critical. The comparison should tag each record that was obtained as a result of this effort “only” and then look into its source and usability. That could be helpful in determining the effectiveness of crowd sourced data for projects of this nature.
- attain time estimate from all three organizations and also the WHO staff that worked on this mission.
- test and evaluate HOT’s new application as a data entry tool. If the tool works well for various scenarios, it could result in efficiency since users can “visualize” the information that they are about to post and that could prevent duplications. It can also help in entering other accurate geographic information such as the city/village or Shabiya names since volunteers can see the layers in that environment. It would also be great if the form allows for usage of validation list to prevent inconsistencies in data entry.