Page 1 of 1
Lemmatization from Alaska using the Internet
Posted: Sun Jun 05, 2016 1:49 am
by Eugene Lutsenko
Is it possible to make a function in Alaska for lemmatization opportunities offered by the Internet?
https://en.wikipedia.org/wiki/Lemmatisation
http://tools.k50project.ru/lemma/
Ideally, it looked like this:
Word_after_lemmatization = DC_Lemmatization(Word_to_lemmatization)
Re: Lemmatization from Alaska using the Internet
Posted: Wed Jun 08, 2016 12:38 pm
by Auge_Ohr
hi
Eugene Lutsenko wrote:Ideally, it looked like this:
Word_after_lemmatization = DC_Lemmatization(Word_to_lemmatization)
ok ... and what algorithm do you want to use for which Language ?
did you try
c:\ALASKA\XPPW32\SOURCE\samples\basics\CAPI\soundex.c
Re: Lemmatization from Alaska using the Internet
Posted: Wed Jun 08, 2016 9:52 pm
by Eugene Lutsenko
Hey, Jimmy!
Nice to talk with you. Lemmatization me anymore Interest for the Russian language, but for others too. But the principle is the same and very simple. I imagine so, that there are two arrays:
- 1st: words to lemmatization, including lemmatizirovannye;
- 2nd: by appropriate words after lemmatization.
The 1st array we are looking for a word given in the function as a parameter. The function returns a word in the 2nd array located in the same position that the word found. It is clear that in the 2nd array of words will be repeated.
Code: Select all
FUNCTION DC_Lemmatization(Word_to_lemmatization)
Pos = ASCAN(Word_tolemmatization,Array1)
IF Pos > 0
Word_after_lemmatization = Array2[Pos]
ELSE
Word_after_lemmatization = 'ERROR'
ENDIF
RETURN(Word_after_lemmatization)
The problem is that these arrays do not have, and where to get them is not clear. But there it on-line services, which it is doing. So I thought, what if there is a possibility to use them directly from Alaska.
c:\ALASKA\XPPW32\SOURCE\samples\basics\CAPI\soundex.c I have not tried
Re: Lemmatization from Alaska using the Internet
Posted: Thu Jun 09, 2016 6:40 pm
by Auge_Ohr
Eugene Lutsenko wrote:... that there are two arrays:
- 1st: words to lemmatization, including lemmatizirovannye;
- 2nd: by appropriate words after lemmatization.
it seems me like a spellchecker with datadic ... c:\ALASKA\XPPW32\SOURCE\samples\activex\spellchecker\spellchk.prg
Eugene Lutsenko wrote:The problem is that these arrays do not have, and where to get them is not clear. But there it on-line services, which it is doing. So I thought, what if there is a possibility to use them directly from Alaska.
so you want to use e.g.
https://translate.google.com
i do not know Google API how to send/get data from translator
Eugene Lutsenko wrote:c:\ALASKA\XPPW32\SOURCE\samples\basics\CAPI\soundex.c I have not tried
SOUNDEX is a other Way ... it is like Phonetic-based chinese Pinyin input method
Re: Lemmatization from Alaska using the Internet
Posted: Wed Jun 15, 2016 10:01 am
by Eugene Lutsenko
Re: Lemmatization from Alaska using the Internet
Posted: Wed Jun 15, 2016 10:59 am
by Auge_Ohr
Eugene Lutsenko wrote:Here there is something for the Russian language:
we can try to help if you have a Xbase++ / Express++ "Code" Problem but with Russian "Language" ...
Re: Lemmatization from Alaska using the Internet
Posted: Wed Jun 15, 2016 11:28 am
by Eugene Lutsenko
I will try to address these modules, as external programs with the necessary parameters and issue the results of their work in text files. And from text files I have no problem'll get what I need: the original word form. On this basis, I An attempt to make such a database, which I need for lemmatization. What language - it is not critical.
https://tech.yandex.ru/mystem/
http://cache-default05f.cdn.yandex.net/ ... -vegas.pdf
http://aidos.online/api.php
http://dev.aidos.online/brain.php
Re: Lemmatization from Alaska using the Internet
Posted: Sat Jun 18, 2016 10:10 pm
by Eugene Lutsenko
I found a very good base for lemmatization, such as I would, but only for Russian language:
https://habrahabr.ru/company/realweb/blog/265375/
https://yadi.sk/d/mElByZe4jg7Qb