"Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP)" from Wikipedia.
Learn moreCheck out our CLEI 2019 paper Impact of Spanish Dialect in Deep Learning Next Sentence Predictors, code and data available on our GitHub organization.
Octroy. Extracting company, amount and reason from the executive proceedings of the governments of the cities of Laval and Montreal. In French.
Voz y voto. Extracting speaker and speaker mentions and automatically identifying gender for national representatives in the transcriptions of the Argentinian congress. In Spanish.
More to come. Bring your data, your problem or your system under the umbrella of the IE4OpenData project.
IE4OpenData is hosted as a GitHub organization, so joining is relatively easy. As this project is just starting, the easier way is to contact us by email.
Email Us