Page 1 of 1

Lemmatization from Alaska using the Internet

Posted: Sun Jun 05, 2016 1:49 am
by Eugene Lutsenko
Is it possible to make a function in Alaska for lemmatization opportunities offered by the Internet?
https://en.wikipedia.org/wiki/Lemmatisation
http://tools.k50project.ru/lemma/

Ideally, it looked like this:

Word_after_lemmatization = DC_Lemmatization(Word_to_lemmatization)

Re: Lemmatization from Alaska using the Internet

Posted: Wed Jun 08, 2016 12:38 pm
by Auge_Ohr
hi
Eugene Lutsenko wrote:Ideally, it looked like this:
Word_after_lemmatization = DC_Lemmatization(Word_to_lemmatization)
ok ... and what algorithm do you want to use for which Language ?

did you try
c:\ALASKA\XPPW32\SOURCE\samples\basics\CAPI\soundex.c

Re: Lemmatization from Alaska using the Internet

Posted: Wed Jun 08, 2016 9:52 pm
by Eugene Lutsenko
Hey, Jimmy!
Nice to talk with you. Lemmatization me anymore Interest for the Russian language, but for others too. But the principle is the same and very simple. I imagine so, that there are two arrays:
- 1st: words to lemmatization, including lemmatizirovannye;
- 2nd: by appropriate words after lemmatization.
The 1st array we are looking for a word given in the function as a parameter. The function returns a word in the 2nd array located in the same position that the word found. It is clear that in the 2nd array of words will be repeated.

Code: Select all

FUNCTION DC_Lemmatization(Word_to_lemmatization)

Pos = ASCAN(Word_tolemmatization,Array1)
IF Pos  > 0
   Word_after_lemmatization = Array2[Pos]
ELSE
   Word_after_lemmatization = 'ERROR'
ENDIF

RETURN(Word_after_lemmatization)
The problem is that these arrays do not have, and where to get them is not clear. But there it on-line services, which it is doing. So I thought, what if there is a possibility to use them directly from Alaska.

c:\ALASKA\XPPW32\SOURCE\samples\basics\CAPI\soundex.c I have not tried

Re: Lemmatization from Alaska using the Internet

Posted: Thu Jun 09, 2016 6:40 pm
by Auge_Ohr
Eugene Lutsenko wrote:... that there are two arrays:
- 1st: words to lemmatization, including lemmatizirovannye;
- 2nd: by appropriate words after lemmatization.
it seems me like a spellchecker with datadic ... c:\ALASKA\XPPW32\SOURCE\samples\activex\spellchecker\spellchk.prg
Eugene Lutsenko wrote:The problem is that these arrays do not have, and where to get them is not clear. But there it on-line services, which it is doing. So I thought, what if there is a possibility to use them directly from Alaska.
so you want to use e.g. https://translate.google.com
i do not know Google API how to send/get data from translator
Eugene Lutsenko wrote:c:\ALASKA\XPPW32\SOURCE\samples\basics\CAPI\soundex.c I have not tried
SOUNDEX is a other Way ... it is like Phonetic-based chinese Pinyin input method

Re: Lemmatization from Alaska using the Internet

Posted: Wed Jun 15, 2016 10:01 am
by Eugene Lutsenko

Re: Lemmatization from Alaska using the Internet

Posted: Wed Jun 15, 2016 10:59 am
by Auge_Ohr
Eugene Lutsenko wrote:Here there is something for the Russian language:
we can try to help if you have a Xbase++ / Express++ "Code" Problem but with Russian "Language" ...

Re: Lemmatization from Alaska using the Internet

Posted: Wed Jun 15, 2016 11:28 am
by Eugene Lutsenko
I will try to address these modules, as external programs with the necessary parameters and issue the results of their work in text files. And from text files I have no problem'll get what I need: the original word form. On this basis, I An attempt to make such a database, which I need for lemmatization. What language - it is not critical.

https://tech.yandex.ru/mystem/
http://cache-default05f.cdn.yandex.net/ ... -vegas.pdf

http://aidos.online/api.php
http://dev.aidos.online/brain.php

Re: Lemmatization from Alaska using the Internet

Posted: Sat Jun 18, 2016 10:10 pm
by Eugene Lutsenko
I found a very good base for lemmatization, such as I would, but only for Russian language:

https://habrahabr.ru/company/realweb/blog/265375/
https://yadi.sk/d/mElByZe4jg7Qb