Lemmatization from Alaska using the Internet

This forum is for eXpress++ general support.
Post Reply
Message
Author
User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Lemmatization from Alaska using the Internet

#1 Post by Eugene Lutsenko »

Is it possible to make a function in Alaska for lemmatization opportunities offered by the Internet?
https://en.wikipedia.org/wiki/Lemmatisation
http://tools.k50project.ru/lemma/

Ideally, it looked like this:

Word_after_lemmatization = DC_Lemmatization(Word_to_lemmatization)

User avatar
Auge_Ohr
Posts: 1428
Joined: Wed Feb 24, 2010 3:44 pm

Re: Lemmatization from Alaska using the Internet

#2 Post by Auge_Ohr »

hi
Eugene Lutsenko wrote:Ideally, it looked like this:
Word_after_lemmatization = DC_Lemmatization(Word_to_lemmatization)
ok ... and what algorithm do you want to use for which Language ?

did you try
c:\ALASKA\XPPW32\SOURCE\samples\basics\CAPI\soundex.c
greetings by OHR
Jimmy

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Lemmatization from Alaska using the Internet

#3 Post by Eugene Lutsenko »

Hey, Jimmy!
Nice to talk with you. Lemmatization me anymore Interest for the Russian language, but for others too. But the principle is the same and very simple. I imagine so, that there are two arrays:
- 1st: words to lemmatization, including lemmatizirovannye;
- 2nd: by appropriate words after lemmatization.
The 1st array we are looking for a word given in the function as a parameter. The function returns a word in the 2nd array located in the same position that the word found. It is clear that in the 2nd array of words will be repeated.

Code: Select all

FUNCTION DC_Lemmatization(Word_to_lemmatization)

Pos = ASCAN(Word_tolemmatization,Array1)
IF Pos  > 0
   Word_after_lemmatization = Array2[Pos]
ELSE
   Word_after_lemmatization = 'ERROR'
ENDIF

RETURN(Word_after_lemmatization)
The problem is that these arrays do not have, and where to get them is not clear. But there it on-line services, which it is doing. So I thought, what if there is a possibility to use them directly from Alaska.

c:\ALASKA\XPPW32\SOURCE\samples\basics\CAPI\soundex.c I have not tried

User avatar
Auge_Ohr
Posts: 1428
Joined: Wed Feb 24, 2010 3:44 pm

Re: Lemmatization from Alaska using the Internet

#4 Post by Auge_Ohr »

Eugene Lutsenko wrote:... that there are two arrays:
- 1st: words to lemmatization, including lemmatizirovannye;
- 2nd: by appropriate words after lemmatization.
it seems me like a spellchecker with datadic ... c:\ALASKA\XPPW32\SOURCE\samples\activex\spellchecker\spellchk.prg
Eugene Lutsenko wrote:The problem is that these arrays do not have, and where to get them is not clear. But there it on-line services, which it is doing. So I thought, what if there is a possibility to use them directly from Alaska.
so you want to use e.g. https://translate.google.com
i do not know Google API how to send/get data from translator
Eugene Lutsenko wrote:c:\ALASKA\XPPW32\SOURCE\samples\basics\CAPI\soundex.c I have not tried
SOUNDEX is a other Way ... it is like Phonetic-based chinese Pinyin input method
greetings by OHR
Jimmy


User avatar
Auge_Ohr
Posts: 1428
Joined: Wed Feb 24, 2010 3:44 pm

Re: Lemmatization from Alaska using the Internet

#6 Post by Auge_Ohr »

Eugene Lutsenko wrote:Here there is something for the Russian language:
we can try to help if you have a Xbase++ / Express++ "Code" Problem but with Russian "Language" ...
greetings by OHR
Jimmy

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Lemmatization from Alaska using the Internet

#7 Post by Eugene Lutsenko »

I will try to address these modules, as external programs with the necessary parameters and issue the results of their work in text files. And from text files I have no problem'll get what I need: the original word form. On this basis, I An attempt to make such a database, which I need for lemmatization. What language - it is not critical.

https://tech.yandex.ru/mystem/
http://cache-default05f.cdn.yandex.net/ ... -vegas.pdf

http://aidos.online/api.php
http://dev.aidos.online/brain.php

User avatar
Eugene Lutsenko
Posts: 1649
Joined: Sat Feb 04, 2012 2:23 am
Location: Russia, Southern federal district, city of Krasnodar
Contact:

Re: Lemmatization from Alaska using the Internet

#8 Post by Eugene Lutsenko »

I found a very good base for lemmatization, such as I would, but only for Russian language:

https://habrahabr.ru/company/realweb/blog/265375/
https://yadi.sk/d/mElByZe4jg7Qb

Post Reply