A-Background
Vector GIS File Formats

Vector GIS File Formats

Vectors are mathematical representations of physical coordinates, often timestamped to provide temporal context.
Extension File Type Description
Esri Shapefile .SHP,
.DBF,
.SHX

The shapefile is the most common geospatial file type, and it is accepted by all commercial and open source systems - realistically, it is too widely used for any platform to attempt to supplant it with a proprietary format. It is the industry standard. Three file types are necessary to make up a shapefile. They are:

  • SHP - for feature geometry
  • SHX - for shape index position
  • DBF - for attribute data

An addition set of filetypes may be included at the user's preference. They are:

  • PRJ - for the projection system
  • XML - for the associated metadata
  • SBN - for the spatial index for optimising queries
  • SBX - for optimising loading times
Geographic JavaScript Object Notation (GeoJSON) .GEOJSON
.JSON

GeoJSON is primarily used for web-based mapping, with GeoJSON storing coordinates as JavaScript Object Notation (JSON) form as textual inputs. This includes vector points, lines and polygons plus tabular information.

GeoJSON store objects inside {} curly brackets overall the format has less markup overhead when compared to GML types. GeoJSON is favoured because it has straightforward architecture and syntax that is easily modified in code platforms and even in text editors like MS Notepad or Apple Notes.

Because all standard browsers support JavaScript, they also support GeoJSON by default. Although JavaScript only understands binary objects, it is able to convert JSON to binary.

Geography Markup Language (GML) .GML

GML enables the use of geographical coordinates as an extension of XML - XML is 'eXtensible Markup Language,' and is favoured because it is readable by machines as well as by people.

GML stores geographic entities, or features, in text format. As with to GeoJSON, GML can be updated in basic text editors as well as code platforms. Each feature stored as a text entry in GML carries a set of properties, vector points (as points or forming lines, curves, surfaces and polygons), plus a spatial reference system.

There is generally more overhead markup in GLM format when compared with GeoJSON. However this is simply because GML provides more context and information from the data it carries.

Google Keyhole Markup Language (KML/KMZ) .KML
.KMZ

Keyhole Markup Language, or KML, is a widely known GIS format. KML is XML-based, and was developed by a software startup called Keyhole Inc that was eventually acquired by Google (Alphabet). As the format used for Google Earth is likely the system used by the largest number of people (albeit people that aren't in GIS).

Subsidiary format KMZ, which is KML-Zipped, replaced KML as the default geospatial Google Earth format for the storage benefits associated with being a compressed file format.

KML/KMZ was adopted as an international standard of the Open Geospatial Consortium in 2008.

KML/KMZ's longitude and latitude components are in decimal degrees, and are as defined by the World Geodetic System of 1984 WGS84. Whereas the vertical component or altitude in KML/KMZ is measured in metres. This is in accordance with the WGS84 EGM96 Geoid vertical datum.

GPS eXchange Format (GPX) .GPX

GPS Exchange format is another XML-based system, so it is sometimes referred to as GPX. It is important to ensure interlocutors understand the reference to geospatial file format, rather than to GPS Global Poisitioning System. The GIS GPS format describes waypoints, tracks and routes as captured by a GPS receiver. Due to GPX being an exchange format, file and platform transfere is straightforward, with users able to transfer GPS data to a new platform simply based on its description properties.

At a minimum GPX requires latitude and longitude coordinates. As an extension GPX files can optionally store attributes including time and elevation as tags.

IDRISI Vector .VCT
.VDC

The IDRISI vector data files have a VCT extension plus associated vector documentation in a VDC extension file format.

The VCT types are limited to containing vector data like points, lines, polygons, and additional attributes in text, and imagery. Conveniently the IDRISI vector file system automatically creates a documentation file for progressively building metadata, which is a feature many have praised.

Associated attributes are stored directly in the VCT vector files, and independent data tables and value files can be included as desired.

MapInfo TAB .TAB
.DAT
.ID
.MAP
.IND
The MapInfo TAB format is a proprietary format of MapInfo software. Although similar to shapefiles, MapInfo TAB requires a set of files to represent geographic information and attributes.
  • TAB format files - this is an ASCII format that links associated ID, DAT, MAP and IND files
  • DAT format files - these contain tabular data associated as a dBase DBF file
  • ID format files - these are index files that link graphical objects to database information
  • MAP format files - these are the map objects to store geographic datapoints
  • IND format files - these are index files for any carried tabular data
OpenStreetMap OSM XML .OSM

OSM files are a native OpenStreetMap format. OpenStreetMap is part of the crowd collaboration movement and is probably going to be the largest crowdsourcing GIS data project in the world for some time. OSM files include a collection of vector-data features as assembled from the community.

OSM is OpenStreetMap’s XML-based file format. There is an associated smaller file format called 'Protocolbuffer Binary Format,' PBF that is used as an alternative to OSM.

The data interoperability in QGIS can load native OSM files, and the OpenStreetMap plugin is able to convert PBF to OSM, so it can then be imported to QGIS.

Digital Line Graph (DLG) .DLG

Connecting traditional with modern, Digital Line Graph or DLG files are vector formats generated on traditional paper topographic maps. Township & boundaries, contour lines, rivers, lakes, roads, railroads and towns have been mapped into DLG.

The UNited States government used DLG for a large amount of its map intelligence.

Geographic Base File-Dual Independent Mask Encoding (GBF-DIME)  

Another United States givernment format, the GPF-DIME format was developed by the US Census Bureau in the late 1960s. It is notable as it was one of the first GIS data formats to exist, and because it stored significant and important datasets such as the US road network around major urban areas, which was used to extrapolate census information and community service requirements.

GPF-DIME supports choropleth mapping and also assists in removing error for digitising features. DIME was a key component to the current TIGER (Topologically Integrated Geographic Encoding and Referencing) system, which was produced by the US Census Bureau.

ArcInfo Coverage  

ArcInfo Coverages are a collection of folders containing points, arcs, polygons and information annotations. Tics are geographic control points and help define the extent of the coverage.

Attributes are stored in the ADF or INFOb tables and each feature is identified with a unique number. These feature numbers are used to connect attribute data with each spatial feature.

Coverage was the standard format utilised during the floppy disk era, but is largely obsolete and unsupported.