Information Extraction for Open Data

Information Extraction for Open Data Logo

Empowering Citizens By Automatically Sifting Through Large Amounts of Text

Information Extraction

"Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP)" from Wikipedia.

Learn more

Check out our CLEI 2019 paper Impact of Spanish Dialect in Deep Learning Next Sentence Predictors, code and data available on our GitHub organization.


Octroy. Extracting company, amount and reason from the executive proceedings of the governments of the cities of Laval and Montreal. In French.

Montreal City Hall (source: Wikipedia)

Voz y voto. Extracting speaker and speaker mentions and automatically identifying gender for national representatives in the transcriptions of the Argentinian congress. In Spanish.

Palace of the Argentine National Congress (source: Wikipedia)

More to come. Bring your data, your problem or your system under the umbrella of the IE4OpenData project.

Headquarters of the United Nations (source: Wikipedia)

Joining Us

IE4OpenData is hosted as a GitHub organization, so joining is relatively easy. As this project is just starting, the easier way is to contact us by email.

Email Us