Ukrainian POS tag dictionary for Morfologik

Ukrainian part-of-speech dictionaries in Morfologik binary format

License	License The Apache Software License, Version 2.0
Categories	Categories Net Search Business Logic Libraries
GroupId	GroupId ua.net.nlp
ArtifactId	ArtifactId morfologik-ukrainian-search
Last Version	Last Version 4.9.1
Release Date	Release Date Mar 15, 2020
Type	Type jar
Description	Description Ukrainian POS tag dictionary for Morfologik Ukrainian part-of-speech dictionaries in Morfologik binary format
Project URL	Project URL https://github.com/brown-uk/dict_uk
Source Code Management	Source Code Management https://github.com/brown-uk/dict_uk.git

Download morfologik-ukrainian-search

Filename	Size
morfologik-ukrainian-search-4.9.1.pom
morfologik-ukrainian-search-4.9.1.jar	3 MB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/ua.net.nlp/morfologik-ukrainian-search/ -->
<dependency>
    <groupId>ua.net.nlp</groupId>
    <artifactId>morfologik-ukrainian-search</artifactId>
    <version>4.9.1</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/ua.net.nlp/morfologik-ukrainian-search/
implementation 'ua.net.nlp:morfologik-ukrainian-search:4.9.1'

Gradle Kotlin

// https://jarcasting.com/artifacts/ua.net.nlp/morfologik-ukrainian-search/
implementation ("ua.net.nlp:morfologik-ukrainian-search:4.9.1")

Apache Buildr

'ua.net.nlp:morfologik-ukrainian-search:jar:4.9.1'

Apache Ivy

<dependency org="ua.net.nlp" name="morfologik-ukrainian-search" rev="4.9.1">
  <artifact name="morfologik-ukrainian-search" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='ua.net.nlp', module='morfologik-ukrainian-search', version='4.9.1')
)

Scala SBT

libraryDependencies += "ua.net.nlp" % "morfologik-ukrainian-search" % "4.9.1"

Leiningen

[ua.net.nlp/morfologik-ukrainian-search "4.9.1"]

Dependencies

There are no dependencies for this project. It is a standalone project that does not depend on any other jars.

Project Modules

There are no modules declared in this project.

Це — великий електронний словник української мови (ВЕСУМ).

This is a project to generate POS tag dictionary for Ukrainian language.

Опис

Словник містить слова та їхні парадигми з відповідними тегами, а також іншу інформацію,
зокрема:
* додаткові теги: slang, rare, bad...
* пропоновані заміни для покручів
* зв’язок між базовими та порівняльними формами прикметників
* керування відмінками для прикметників

Для всіх файлів в data/dict цей проект генерує всі можливі словоформи з тегами частин мови
за допомогою правил афіксів у каталозі data/affix.

Докладніша інформація в теці doc/

Вимоги до програмних засобів

java (JDK >= 8)
4Гб вільної пам'яті

Застосування

зі словником можна робити дві речі:

згенерувати всі можливі словоформи для слів, що вже є в словнику (див. параграф «Як запускати» нижче)
генерувати форми для довільних слів в інтерактивному режимі: докладніше

Як встановити

Встановити java (JDK 8 або новішу)
(Лише для Windows) встановити і запустити git bash
Клонувати проект: git clone https://github.com/brown-uk/dict_uk.git
Зайти в теку проекту: cd dict_uk

Як запускати

`./gradlew expand`

або для Windows:

`bin/expand_win.sh`

На виході:

out/dict_corp_vis.txt - словник у візуальному форматі (з відступами, згрупований за лемами) для перегляду, аналізу і опрацьовування
out/dict_corp_lt.txt - словник у табличному форматі для використання в ПЗ, зокрема з цього файлу генеруємо словник morfologik, що використовується в LanguageTool
out/words.txt - список всіх відомих словоформ
out/words_spell.txt - список всіх відомих словоформ, правильних з погляду правопису
out/lemmas.txt - список лем

Ліцензія

Дані словника доступні для використання згідно з умовами ліцензії "Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License" (https://creativecommons.org/licenses/by-nc-sa/4.0/)

Програмні засоби вільно розповсюджується за умов ліцензії GPL версії 3.

Зауваження: похідні проекти мають свої ліцензії

Окрім цього матеріали цього проєкту дозволено використовувати у проєктах https://voice.mozilla.org/uk і https://common-voice.github.io/sentence-collector/#/ відповідно до їх ліцензій.

Похідні проекти

Description

For all files in data/dict the project generates all possible word forms with POS tags
by using affix rules from files in data/affix.

Required software

java (JDK >= 8)
4G of free RAM

How to run

`./gradlew expand`

or on Windows:

`bin/expand_win.sh`

Output:

out/dict_corp_vis.txt - Dictionary in visual (indented) format for review, analysis or conversion
out/dict_corp_lt.txt - Dictionary in flat format (is used for preparing morfologik dictionary that can be used by LanguageTool)
out/words.txt - list of all unique known words
out/words_spell.txt - words valid for spelling
out/lemmas.txt - list of unique lemmas

Building under docker

sudo docker build -t brown-uk/dict_uk .
sudo docker run -d --name dict_uk brown-uk/dict_uk /bin/bash
sudo docker cp dict_uk:/src/out/ ./out
sudo chown -R $USER: ./out
sudo docker stop dict_uk

License

Dictionary data are distributed under "Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License" (https://creativecommons.org/licenses/by-nc-sa/4.0/)

Software is distributed under GPLv3.

Note: derivative projects have different licenses

Besides that materials in this project are allowed to be used in https://voice.mozilla.org/uk and https://common-voice.github.io/sentence-collector/#/ according to their licenses.

Derivative Projects

Корпус сучасної української мови (БрУК)

На принципах Браунського корпусу створити анотований корпус сучасної української мови (БрУК) обсягом 1 млн слововживань

Versions

Version
4.9.1 Mar 15, 2020
4.9.0 Mar 15, 2020
3.9.0 Sep 20, 2017
3.7.6 May 19, 2017
3.7.5 Apr 14, 2017
3.7.4 Apr 13, 2017

Ukrainian POS tag dictionary for Morfologik

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download morfologik-ukrainian-search

How to add to project

Dependencies

Project Modules

Це — великий електронний словник української мови (ВЕСУМ).

This is a project to generate POS tag dictionary for Ukrainian language.

Опис

Вимоги до програмних засобів

Застосування

Як встановити

Як запускати

Ліцензія

Похідні проекти

Description

Required software

How to run

Building under docker

License

Derivative Projects

Корпус сучасної української мови (БрУК)

Versions