View on GitHub

GeoBases

Data services and visualization

Download this project as a .zip file Download this project as a tar.gz file

News!

Version 5 has just been released. Check out the release notes!

Introduction

This project provides tools to play with geographical data. It also works with non-geographical data, except for map visualizations :).

There are embedded data sources in the project, but you can easily play with your own data in addition to the available ones. After data loading, you can:

This is entirely written in Python. The core part is a Python package, but there is a command line tool as well! Get it with easy_install, then you can see where are airports with international in their name:

$ GeoBase --fuzzy international --map

You can perform all types of queries:

$ GeoBase --base cities --fuzzy "san francisko" # typo here :)

Of course, you can use your own data for map display:

$ cat coords.csv
p1,48.22,2.33
p2,49.33,2.24
$ cat coords.csv | GeoBase --map

And for every other thing as well:

$ cat edges.csv
A,B
A,C
D,A
$ cat edges.csv | GeoBase --graph

Administrate the data sources:

$ GeoBase --admin

We are currently gathering input from the community to define the next version features, so do not hesitate to open issues on the github page.

Documentation

Here are some useful links:

Installation

Prerequisites

These prerequisites are very standard packages which are often installed by default on Linux distributions. But make sure you have them anyway.

First you need to install setuptools (as root):

$ apt-get install python-setuptools    # for debian
$ yum install python-setuptools.noarch # for fedora

Then you need some basics compilation stuff to compile dependencies (also as root):

$ apt-get install python-dev g++    # for debian
$ yum install python-devel gcc-c++  # for fedora

From PyPI

You can install it from PyPI:

$ easy_install --user -U GeoBases

There is a development version also on PyPI:

$ easy_install --user -U GeoBasesDev

From Github

You can clone the project from github:

$ git clone https://github.com/opentraveldata/geobases.git

Then install the package and its dependencies:

$ cd geobases
$ python setup.py install --user # for user space

Final steps

A script is put in ~/.local/bin, to be able to use it, put that in your ~/.bashrc or ~/.zshrc:

export PATH=$PATH:$HOME/.local/bin
export BACKGROUND_COLOR=black # or 'white', your call

If you use zsh and want to have awesome autocomplete for the main script, add this to your ~/.zshrc:

# Add custom completion scripts
fpath=(~/.zsh/completion $fpath)
autoload -U compinit
compinit

Python 3 support

There is Python 3 support, you can try it by changing branch before installation. Install setuptools and python3-dev as prerequisites, then:

$ git checkout 3000
$ python3 setup.py install --user

You can also install the package from PyPI:

$ easy_install-3.2 --user -U GeoBases3K

Quickstart

>>> from GeoBases import GeoBase
>>> geo_o = GeoBase(data='ori_por', verbose=False)
>>> geo_a = GeoBase(data='airports', verbose=False)
>>> geo_t = GeoBase(data='stations', verbose=False)

You can provide other values for the data parameter. All data sources are documented in a single YAML file:

All features are unaware of the underlying data, and are available as long as the headers are properly set in the configuration file, or from the Python API. For geographical features, you have to name the latitude field lat, and the longitude field lng.

Features

Information access

>>> geo_o.get('CDG', 'city_code')
'PAR'
>>> geo_o.get('BRU', 'name')
'Bruxelles National'
>>> geo_t.get('frnic', 'name')
'Nice-Ville'
>>> geo_t.get('fr_not_exist', 'name', default='NAME')
'NAME'

You can put your own data in a GeoBase class, either by loading your own file when creating the instance, or by creating an empty instance and using the set method.

Find things with properties

>>> conditions = [('city_code', 'PAR'), ('location_type', ('H',))]
>>> list(geo_o.findWith(conditions, mode='and'))
[(2, 'JDP'), (2, 'JPU')]
>>>
>>> conditions = [('city_code', 'PAR'), ('city_code', 'LON')]
>>> len(list(geo_o.findWith(conditions, mode='or')))
36

Distance computation

>>> geo_o.distance('CDG', 'NCE')
694.5162...

Find things near a geocode

>>> # Paris, airports <= 40km
>>> [k for _, k in sorted(geo_a.findNearPoint((48.84, 2.367), 40))]
['ORY', 'LBG', 'TNF', 'CDG']
>>>
>>> # Nice, stations <= 4km
>>> iterable = geo_t.findNearPoint((43.70, 7.26), 4)
>>> [geo_t.get(k, 'name') for _, k in iterable]
['Nice-Ville', 'Nice-St-Roch', 'Nice-Riquier']

Find things near another thing

>>> sorted(geo_a.findNearKey('ORY', 50)) # Orly, airports <= 50km
[(0.0, 'ORY'), (18.8..., 'TNF'), (27.8..., 'LBG'), (34.8..., 'CDG')]
>>>
>>> sorted(geo_t.findNearKey('frnic', 3)) # Nice station, <= 3km
[(0.0, 'frnic'), (2.2..., 'fr4342'), (2.3..., 'fr5737')]

Find closest things from a geocode

>>> list(geo_a.findClosestFromPoint((43.70, 7.26))) # Nice
[(5.82..., 'NCE')]
>>>
>>> list(geo_a.findClosestFromPoint((43.70, 7.26), N=3)) # Nice
[(5.82..., 'NCE'), (30.28..., 'CEQ'), (79.71..., 'ALL')]

Approximate name matching

>>> geo_t.fuzzyFind('Marseille Charles', 'name')[0]
(0.8..., 'frmsc')
>>> geo_a.fuzzyFind('paris de gaulle', 'name')[0]
(0.78..., 'CDG')

Map display

>>> geo_t.visualize()
> Affecting category None     to color blue    | volume 3190
* Added lines for duplicates linking, total 0

* Now you may use your browser to visualize:
example_map.html example_table.html

* If you want to clean the temporary files:
rm example.json ...

(['example_map.html', 'example_table.html'], 2)

Standalone script

Installation of the package will also deploy a standalone script named GeoBase:

$ GeoBase ORY CDG              # query on the keys ORY and CDG
$ GeoBase --closest CDG        # closest from CDG
$ GeoBase --near LIG           # near LIG
$ GeoBase --fuzzy marseille    # fuzzy search on 'marseille'
$ GeoBase --help               # your best friend

In the previous picture, you have an overview of the command line verbose display. Three displays are available for the command line tool:

With the verbose display, entries are displayed on each column, and the available fields on each line. Fields starting with __ like __field__ are special. This means they were added during data loading:

More examples here, for example how to do a search on a field, like admin_code (B8 is french riviera):

$ GeoBase -E adm1_code -e B8

Same with csv output (customize with --show):

$ GeoBase -E adm1_code -e B8 --quiet

Make a fuzzy search:

$ GeoBase --fuzzy sur mer

All data under 200 km from Paris:

$ GeoBase --near PAR -N 200

Map display for a specific GMT offset:

$ GeoBase -E gmt_offset -e 1.0 --map

Reading data input directly on stdin:

$ echo -e 'ORY^Orly\nCDG^Charles' | GeoBase