M. GADRI Said

MCA

Directory of teachers

Department

Informatics Department

Research Interests

Artificial Intelligence AI Text Mining TM Machine Learning ML Deep Learning DL Natural Language Processing NLP Information Retrieval IR

Contact Info

University of M'Sila, Algeria

On the Web:

Recent Publications

2024-09-24

Nouvel outil de diagnostic de la leishmaniose cutanée Basé sur l’Intelligence Artificielle

Background: Cutaneous leishmaniasis (CL) is a parasitic disease caused by protozoan parasites of the genus Leishmania, leading to significant morbidity in endemic regions. While effective, traditional diagnostic methods often suffer from limitations such as the requirement for specialized expertise and prolonged processing times. Recently, artificial intelligence (AI) methodologies have emerged to enhance the diagnostic accuracy and efficiency for CL.
Objective: This project aims to develop and make available to biologists a new, rapid, more efficient, and more precise cutaneous leishmaniasis diagnosis method and tool based on the latest techniques of artificial intelligence AI and computer vision CV.
Methods: We used a deep learning model (YOLO 8) to detect leishmania parasite bodies in microscopic images; we trained the model on microscopic images collected at the Algerian Pasteur Institute, Annex of M'sila. We implemented the proposed model on a mobile application to validate its performance.
Results: The application of YOLO 8 for the detection of leishmania parasite bodies in microscopic images gives a high accuracy of 97% over the entire test dataset.
Conclusion: This research demonstrated the significant potential of AI-based object detection models, particularly YOLOv8 for the accurate detection of leishmania parasites in microscopic images. The obtained results pave the way for promising clinical applications and further research in this field
Citation

M. GADRI Said, (2024-09-24), "Nouvel outil de diagnostic de la leishmaniose cutanée Basé sur l’Intelligence Artificielle", [national] Université Mohamed Boudiaf de M'sila

2023-11-19

Prédiction de la résistance antibactérienne aux antibiotiques à l’aide de l’intelligence artificielle

La résistance aux antibiotiques est devenue un problème de santé mondial critique en raison de l'inefficacité des antibiotiques, de la propagation de bactéries résistantes et des options de traitement limitées, ce qui nécessite le développement d'approches innovantes pour prédire et combattre ce phénomène. Dans notre étude, des algorithmes d'apprentissage automatique et d’apprentissage profond ont été utilisés pour développer des modèles prédictifs basés sur diverses caractéristiques dérivées de données génomiques, phénotypiques et cliniques. Nous avons développé un modèle qui prédit la résistance aux antibiotiques chez les bactéries Escherichia coli, permettant une identification précise et rapide des souches bactériennes résistantes. Ce modèle s'est avéré très efficace et précis dans la prédiction de la résistance aux antibiotiques.

Mots clés : Bactéries résistantes, Antibiotiques, Escherichia coli, Apprentissage automatique ; Apprentissage profond, Réseaux de neurones artificiels.
Citation

M. GADRI Said, (2023-11-19), "Prédiction de la résistance antibactérienne aux antibiotiques à l’aide de l’intelligence artificielle", [national] Mohamed Boudiaf University

2023-10-23

keynote talk entitled “Artificial Intelligence and Mathematics”

Relationship beteween Mathematics and AI
Citation

M. GADRI Said, (2023-10-23), "keynote talk entitled “Artificial Intelligence and Mathematics”", [national] Artificial Intelligence and Mathematics MMS'23 , ENS, Bou Saada, Algeria

2023-09-20

Invited Guest

How to prepare an article?
How to publish in a Q1 journals (Nature&Science)?
Citation

M. GADRI Said, (2023-09-20), "Invited Guest", [national] Seminar on Submission of a Scientific Article to Nature or Science Journals , Cerist, Algiers, Algeria

2023-06-28

Un Système Automatique pour Assurer la Sécurité du Conducteur Automobile

This project proposes an innovative and advanced system for the field of smart vehicles and transportation. The system uses advanced AI-based techniques to detect the driver's state, the level of LPG, internal temperature, humidity, the safe distance from other vehicles, speeding; the distance traveled requiring rest, and other options. The objectives of this project are to ensure the safety of the driver, the vehicle, and other vehicles on the road, while minimizing human and material damage caused by road accidents. This project can be considered as an important step in the search for more effective and intelligent road safety solutions.
Keywords: Drowsiness Detection, Deep Learning, CNN, Computer Vision. Road safety, Leak detection, Smart vehicles.
Citation

M. GADRI Said, (2023-06-28), "Un Système Automatique pour Assurer la Sécurité du Conducteur Automobile", [national] Mohamed Boudiaf University

2023-03-16

Artificial Intelligence : Concepts, Subfields, Applications, Advances

Basics and Fundamental Concepts, Subfields, Advances
Citation

M. GADRI Said, (2023-03-16), "Artificial Intelligence : Concepts, Subfields, Applications, Advances", [national] Artificial Intelligence Week 2023 , Mohamed Boudiaf University of M’sila

2023

Artificial Intelligence and its Applications in the Field of Humanities and Social Sciences ChatGPT as Model

AI is everywhere.
We are heading towards fully intelligent societies.
In a few years, our whole life will be artificial.
We must join this evolution and this technological revolution as quickly as possible
Schedule AI as a fundamental subject in all specialties.
Citation

M. GADRI Said, (2023), "Artificial Intelligence and its Applications in the Field of Humanities and Social Sciences ChatGPT as Model", [national] Faculty of Humanities and Sociology

Drowsiness Detection System: Developing an Innovative Model Based on Deep Learning Approach

A drowsiness detection system is an innovative solution for drivers especially those who drive their cars day and night. It detects the driver's drows-iness and gives feedback before it becomes dangerous for him. If it detects that a driver is getting drowsy, it warns him/her through a warning sound. In this paper, we proposed a smart innovative system based on the DL approach that helps to detect efficiently driver drowsiness. The main idea behind this system is to use eye movements analysis. In the first step, we propose a novel deep-learning approach based on Convolutional Neural Network (CNN) to detect drowsiness.
Citation

M. GADRI Said, (2023), "Drowsiness Detection System: Developing an Innovative Model Based on Deep Learning Approach", [international] The First International Workshop on Machine Learning and Deep Learning WMLDL 2023 , Mohamed Boudiaf University of M'sila, Algeria

Invited Participant

Cette rencontre est une coopération entre l'université de Oued Souf et un Laboratoire de recherche de l'université Paris XII
Citation

M. GADRI Said, (2023), "Invited Participant", [international] Smart Agri-Tech’23 , University of El-Oued, Algeria

2022

Developing a Multilingual Stemmer for the Requirement of Text Categorization and Information Retrieval

: Information retrieval IR is the process of finding information (generally documents)
that matches the needs of the user. One way to improve the search effectiveness, as well as the
quality of text categorization is to build an effective stemmer that helps to match users’ queries
with relevant documents in IR and reduce the space of textual representation in TC. This has
been always an interesting research topic in IR and TC. We can define stemming as the process
of reducing inflected and derived words to their reduced forms (stems or roots). Many
stemmers have been developed for different languages, but there is always many weaknesses
and problems. In the present work, we have developed a multilingual stemming approach,
based on the extraction of the word root and that exploits the technique of n-grams of
characters. Our experiments have been done on three languages which are: Arabic, English,
and French.
Keywords: Information retrieval, Machine learning, Natural language processing, Root
extraction, Stemming
Citation

M. GADRI Said, (2022), "Developing a Multilingual Stemmer for the Requirement of Text Categorization and Information Retrieval", [national] International Journal on Electrical Engineering and Informatics , School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia

Invited participant

Conférence des chercheurs algériens à l'étranger
Citation

M. GADRI Said, (2022), "Invited participant", [national] TC’22 , Moufdi Zakaria Palace, Algiers, Algeria

2021

Habilitation Thesis

Habilitation thesis
Citation

M. GADRI Said, (2021), "Habilitation Thesis", [national] University Mohamed Boudiaf of M'sila , University Mohamed Boudiaf of M'sila

An Efficient System to Predict Customers’ Satisfaction on Touristic Services Using ML and DL Approaches

In the last decade, neural networks NNs become a favorable solution for many applications in artificial intelligence AI. For instance, the majority of tourism companies have professional websites where customers can book: flights, bus and taxi trips, hotels, restaurants, etc. they can also compare services in terms of prices, locations, services quality, and other interesting criterion. For this purpose, the used dataset consists of a sample of hotel reviews provided by customers who have reserved recently. Analyzing these reviews will help companies to know if their services are suitable for customers, satisfy their needs and what is the degree of this satisfaction. i.e., customers are happy or not? Satisfied or not? Our main objective in this work is to develop an efficient and intelligent system based on NNs which allows us to predict how customers feel about the provided services. To accomplish this work, we have proceeded to the classification task using many machine learning algorithms, including LDA, KNN, CART, NB, and SVM. Then, we proposed in the second stage a deep neural network DNN model to perform the same task. Finally, we established a short comparison between the different algorithms. In the programming stage, we benefited from the large opportunities offered by Python language, as well as Tensorflow and Keras libraries.

Keywords: machine learning, deep learning, Artificial Neural Networks, Natural Language Processing, Social media;
Citation

M. GADRI Said, (2021), "An Efficient System to Predict Customers’ Satisfaction on Touristic Services Using ML and DL Approaches", [international] 22rd Arabic International Conference on Information Technology (ACIT 2021), 21-23 Dec, IEEE Conference, Quabos University, Sultanate Oman , Quabos University, Sultanate Oman

Handwritten Digit Recognition: Developing an Efficient ML and DL Model to Recognize Handwritten Digits

Deep learning DL is a new subfield of machine learning ML area which is used during the last decades to develop more sophisticated algorithms allowing high performance in some popular recognition fields, such as: pattern recognition, computer vision and image classification. Among the most used methods in DL, we find CNNs (Convolutional Neural Networks) which can be considered as the best used technique. In the present work, we have developed an automatic classifier that permits to classify some given grayscale images representing handwritten digits into one of 10 classes (digits from 0 to 9), inclusively. For this purpose, we have used ML and DL approaches. First, we proceeded to the classification task using many ML algorithms including: LR, LDA, KNN, CART, NB, and SVM. Second, we proposed a new CNN model composed of many convolutional layers. Finally, we established a comparison between different algorithms.
Citation

M. GADRI Said, (2021), "Handwritten Digit Recognition: Developing an Efficient ML and DL Model to Recognize Handwritten Digits", [national] CNIATI'21 Conférence Nationale sur l ’Intelligence artificielle et les technologies de l'information, 24 Mai 2021, University Chadli Ben Djedid, Al-Taref , University Chadli Ben Djedid, Al-Taref.

Developing an efficient Predictive model based on ML and DL approaches to detect diabetes

During the last decade, some important progress in machine learning ML area has been made, especially with the apparition of a new subfield called deep learning DL and CNN networks (Convolutional Neural Networks). This new tendency is used to perform much more sophisticated algorithms allowing high performance in many disciplines such as; pattern recognition, image classification, computer vision, as well as other supervised and unsupervised classification tasks. In this work, we have developed an automatic classifier that permits the classification of a large number of diabetic patients based on some blood characteristics by using ML and DL approaches. Initially, we have proceeded to the classification task using many ML algorithms. Then we proposed a simple DNN model composed of many layers. Finally, we established a comparison between ML and DL algorithms, as well as our model with other existing models. For the programming task, we have used Python, Tensorflow, and Keras which are the most used in the field.
Citation

M. GADRI Said, (2021), "Developing an efficient Predictive model based on ML and DL approaches to detect diabetes", [national] The International Journal of Computing and Informatics Informatica , Slovensko društvo INFORMATIKA

Sentiment Analysis: Developing an Efficient Model Based on Machine Learning and Deep Learning Approaches

Sentiment analysis is a subfield of text mining. It is the process of categorizing opinions expressed in a piece of text. a simple form of such analysis would be to predict whether the opinion about something is positive or negative (polari-ty). The present paper proposes an efficient sentiment analysis model based on machine learning ML and deep learning DL approaches. A DNN (Deep Neural Network) model is used to extract the relevant features from customer reviews, perform a training task on almost of samples of the dataset, validate the model on a small subset called the test set and consequently compute the accuracy of sentiment classification. For the programming stage, we benefited from the large opportunities offered by Python language, as well as Tensorflow and Keras libraries
Citation

M. GADRI Said, (2021), "Sentiment Analysis: Developing an Efficient Model Based on Machine Learning and Deep Learning Approaches", [international] Intelligent Computing & Optimization. ICO 2021. 30-31 Dec, 2021 , Thailande

Efficient Traffic Signs Recognition Based on CNN Model for Self-Driving Cars

Self-Driving Cars or Autonomous Cars provide many benefits for humanity, such as reduction of deaths and injuries in road accidents, reduction of air pol-lution, increasing the quality of car control. For this purpose, some cameras or sensors are placed on the car, and an efficient control system must be set up, this system allows to receive images from different cameras and/or sensors in real-time especially those representing traffic signs, and process them to allows high autonomous control and driving of the car. Among the most promising al-gorithms used in this field, we find convolutional neural networks CNN. In the present work, we have proposed a CNN model composed of many convolu-tional layers, max-pooling layers, and fully connected layers. As programming tools, we have used python, Tensorflow, and Keras which are currently the most used in the field.
Citation

M. GADRI Said, (2021), "Efficient Traffic Signs Recognition Based on CNN Model for Self-Driving Cars", [international] Intelligent Computing & Optimization. ICO 2021. 30-31 Dec, 2021 , Thailande

2020

Diabetic patient classification: an efficient algorithm based on deep learning approach

During the last decade, some important progress on machine learning ML area have been made, especially with the apparition of a new subfield called deep learning DL and CNN networks (Convolutional Neural Networks). This new tendency is used to perform much more sophisticated algorithms allowing high performance in many disciplines such as: pattern recognition, image classification, computer vision, as well as other supervised and unsupervised classification tasks. In this work, we have developed an automatic classifier that permits to classify a number of diabetic patients based on some blood characteristics by using ML approach and DL approach. Initially, we have proceeded to the classification task using many ML algorithms. Then we proposed a simple CNN model composed of many layers. Finally, we established a comparison between ML and DL algorithms. For programming, we have used Python, Tensorflow and Keras which are the most used in the field.
Citation

M. GADRI Said, (2020), "Diabetic patient classification: an efficient algorithm based on deep learning approach", [national] International Journal of Advances in Electronics and Computer Science IJAECS , International Journal of Advances in Electronics and Computer Science IJAECS

Efficient Arabic handwritten character recognition based on machine learning and deep learning approaches

Arabic Handwritten character recognition is one of the most studied topics since many decades, there exists many difficulties which prevent to have significant advances in this important field such as: the variability of handwriting from a person to another, the large availability of databases, the complicated morphology of Arabic as a very rich Semitic language.
In this paper, we proposed a deep learning model based on convolutional neural networks CNN which permits to achieve a high performance in Arabic handwritten characters recognition.
Citation

M. GADRI Said, (2020), "Efficient Arabic handwritten character recognition based on machine learning and deep learning approaches", [national] Journal of Advanced Research in Dynamical & Control Systems , Journal of Advanced Research in Dynamical & Control Systems

Building Best Predictive Models Using ML and DL Approaches to Categorize Fashion Clothes

Today Deep learning approach DL becomes the new tendency of ma-chine learning approach ML which is used since it gives much more sophisticated pattern recognition and image classification than classic machine learning ap-proach. Among the most used methods in DL, CNNs are for a special interest. In this work, we have developed an automatic classifier that permits to classify a large number of fashion clothing articles based on ML and DL approaches. Ini-tially, we proceeded to the classification task using many ML algorithms, then we proposed a new CNN model composed of many convolutional layers, one maxpooling layer, and one full connected layer. Finally, we established a com-parison between different algorithms. As programming tools, we have used Py-thon, Tensoflow, and Keras which are the most used in the field.
Citation

M. GADRI Said, (2020), "Building Best Predictive Models Using ML and DL Approaches to Categorize Fashion Clothes", [international] The 19th International Conference on Artificial Intelligence and Soft Computing ICAISC 2020 (H5-index = 20) (Springer Conference). held in Zakopane, Poland, 12 - 14, October 2020 , Zakopane, Poland

2019

Diabetic Patient Classification: An Efficient Algorithm Based on Deep Learning Approach

During the last decade, some important progress on machine learning ML area have been made, especially with the apparition of a new subfield called deep learning DL and CNN networks (Convolutional Neural Networks). This new tendency is used to perform much more sophisticated algorithms allowing high performance in many disciplines such as: pattern recognition, image classification, computer vision, as well as other supervised and unsupervised classification tasks. In this work, we have developed an automatic classifier that permits to classify a number of diabetic patients based on some blood characteristics by using ML approach and DL approach. Initially, we have proceeded to the classification task using many ML algorithms. Then we proposed a simple CNN model composed of many layers. Finally, we established a comparison between ML and DL algorithms. For programming, we have used Python, Tensorflow and Keras which are the most used in the field.
Citation

M. GADRI Said, (2019), "Diabetic Patient Classification: An Efficient Algorithm Based on Deep Learning Approach", [international] International Conference on Science, Engineering&Technology (ICSET) , Istanbul, Turkey

2018

A New Multilingual Stemmer to Improve the Effectiveness of Text Categorization and Information Retrieval

Information retrieval IR is the process of finding information (generally documents) that matches the needs of the user. One way to improve the search effectiveness as well as the quality of text categorization is to build an effective stemmer that helps to match user’s queries with relevant documents in IR and reduce the space of textual representation in TC. This has been always an interesting research topic in IR and TC. We can define stemming as the process of reducing inflected and derived words to their reduced forms (stems or roots). Many stemmers have been developed for different languages, but still there arethere is always many weakness and problems. In the present work, we have developed a new multilingual stemming approach, based on the extraction of the word root and that exploits the technique of n-grams of characters. Our experiments have been done on three languages which are: Arabic, English and French
Citation

M. GADRI Said, (2018), "A New Multilingual Stemmer to Improve the Effectiveness of Text Categorization and Information Retrieval", [international] Doctoral seminar of Vienna University , University of Vienna, Austria

Arabic Information Retrieval: Influence of Stemming on the Effectiveness of Search in IRS,

Stemming is a technique which permits to improve the quality of categorization in TC and the effectiveness of search in information retrieval systems IRS. In Arabic, two families of methods are used to find the stem of a word; morphological methods which are difficult to implement and require a deep linguistic knowledge in Arabic such as: morphological rules, and statistical methods which are easy to implement, more practical, and do not require prior linguistic knowledge, but only some calculations of probabilities.
In this paper we propose a new Arabic stemmer based on the extraction of the root, and completely statistical. So, it does not require any morphological rule or grammatical patterns.
Citation

M. GADRI Said, (2018), "Arabic Information Retrieval: Influence of Stemming on the Effectiveness of Search in IRS,", [international] 7th International Conference on advanced Technology ICAT'18, , Antalya, Turkey

2017

Multilingual Text Categorization: (Based on Machine Learning Algorithms and Ontologies)

Text categorization is an important task in text mining process that consists in assigning a set of texts to a set of predefined categories based on learning algorithms. There exist two kinds of text categorization: monolingual and multilingual text categorization. The main problematic of this manuscript is how to exploit concepts and algorithms of machine learning in contextual categorization of multilingual texts. Our study on this subject allowed us to propose many solutions and provide many contributions, notably: (1) a simple, fast and effective algorithm to identify the language of a text in multilingual corpus. (2) An improved algorithm for Arabic stemming based on a statistical approach. Its main objective is to reduce the size of term vocabulary and thus increase the quality of the obtained categorization in TC and the effectiveness of search in IR. (3) A new multilingual stemmer which is general and completely independent of any language. (4) Application of new panoply of pseudo-distances to categorize texts of a big corpus such as Reuters21578 collection. All these solutions were the subject of many academic papers published in international conferences and journals
Citation

M. GADRI Said, (2017), "Multilingual Text Categorization: (Based on Machine Learning Algorithms and Ontologies)", [national] , Noor Publishing

Application of a New Set of Pseudo-Distances in Documents Categorization,

Automatic text classi cation is a very important task that consists in
assigning labels (categories, groups, classes) to a given text based on a set of previ-
ously labeled texts called training set. The work presented in this paper treats the
problem of automatic topical text categorization. It is a supervised classi cation
because it works on a prede ned set of classes and topical because it uses topics or
subjects of texts as classes. In this context, we used a new approach based on k-NN
algorithm, as well as a new set of pseudo-distances (distance metrics) known in the
eld of language identi cation. We also proposed a simple and e ective method to
improve the quality of performed categorization.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2017), "Application of a New Set of Pseudo-Distances in Documents Categorization,", [national] Neural Network World NNW , Czech Technical University in Prague Faculty of Transportation Sciences

Arabic Text Categorization: An Improved Algorithm Based on Ngrams technique to Extract Arabic Words Roots,

One of the methods used to reduce the size of terms vocabulary in Arabic text categorization is to replace the different variants (forms) of words by their common root. This process is called stemming based on the extraction of the root. Therefore, the search of the root in Arabic or Arabic word root extraction is more difficult than in other languages since the Arabic language has a very different and difficult structure, that is because it is a very rich language with complex morphology. Many algorithms are proposed in this field. Some of them are based on morphological rules and grammatical patterns, thus they are quite difficult and require deep linguistic knowledge. Others are statistical, so they are less difficult and based only on some calculations. In this paper we propose an improved stemming algorithm based on the extraction of the root and the technique of n-grams which permit to return Arabic words’ stems without using any morphological rules or grammatical patterns.
Citation

M. GADRI Said, Abdelouhab Moussaoui, , (2017), "Arabic Text Categorization: An Improved Algorithm Based on Ngrams technique to Extract Arabic Words Roots,", [national] International Arab Journal of Information Technology IAJIT , Zarqa University, Jordan

2016

• Automatic Contextual Categorization of Multilingual Semi-Structured Documents

Text categorization is an important task in text mining process that consists in assigning a set of texts
to a set of predefined categories based on learning algorithms. There exist two kinds of text
categorization: monolingual and multilingual text categorization. The main problematic of this thesis is
how to exploit concepts and algorithms of machine learning in contextual categorization of
multilingual texts. Our study on this subject allowed us to propose many solutions and provide many
contributions, notably: a simple, fast and effective algorithm to identify the language of a text in
multilingual corpus. An improved algorithm for Arabic stemming based on a statistical approach, its
main objective is to reduce the size of term vocabulary and thus increase the quality of the obtained
categorization in TC and the effectiveness of search in IR. A new multilingual stemmer which is
general and completely independent of any language. Application of new panoply of pseudo-distances
to categorize texts of a big corpus such as Reuters21578 collection. All these solutions were the
subject of many academic papers published in international conferences and journals.
Citation

M. GADRI Said, (2016), "• Automatic Contextual Categorization of Multilingual Semi-Structured Documents", [national] University Farhat Abbes of Setif

2015

An Effective Multilingual Stemmer Based on the Extraction of the Root and the N-grams Technique

Stemming is a technique used to reduce inflected and derived words to
their basic forms (stem or root). It is a very important step of pre-processing in text
mining, and generally used in many areas of research such as: Natural language
Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval
IR, and other tasks in text mining. Stemming is useful in text categorization
to reduce the size of terms vocabulary, and in information retrieval to improve the
search effectiveness and then gives us relevant results. In this paper, we propose a
new multilingual stemmer based on the extraction of word root and in which we
use the technique of n-grams. We
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2015), "An Effective Multilingual Stemmer Based on the Extraction of the Root and the N-grams Technique", [international] AIST’2015 international scientific conference Analysis of Images, Social networks, and Texts , Yekaterinburg, Russia.

Multilingual Text Categorization: Increasing the Quality of Categorization by a Statistical Stemming Approach

Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results.
In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2015), "Multilingual Text Categorization: Increasing the Quality of Categorization by a Statistical Stemming Approach", [international] International Conference on intelligent Information Processing Security and Advanced Communication (IPAC 2015), ACM Conference , Batna, Algeria

Multilingual information retrieval: increasing the effectiveness of search by stemming

Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2015), "Multilingual information retrieval: increasing the effectiveness of search by stemming", [international] 19th International Conference onCircuits, Systems, Communications and Computers (CSCC 2015) , Zakynthos Island, Greece

A new Multilingual Stemmer Based on the Extraction of the Root and the N-grams Technique

Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results.
In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2015), "A new Multilingual Stemmer Based on the Extraction of the Root and the N-grams Technique", [international] ICIPCE’2015 International Conference on Information Processing and Control Engineering , Moscow, Russia

Information Retrieval: A New Multilingual Stemmer Based on a Statistical Approach

Stemming is a technique used to reduce inflected
and derived words to their basic forms (stem or root). It is a very
important step of pre-processing in text mining, and generally
used in many areas of research such as: Natural language
Processing NLP, Text Categorization TC, Text Summarizing TS,
Information Retrieval IR, and other tasks in text mining.
Stemming is frequently useful in text categorization to reduce the
size of terms vocabulary, and in information retrieval to improve
the search effectiveness and then gives us relevant results.
In this paper, we propose a new multilingual stemmer
based on the extraction of word root and in which we use the
technique of n-grams. We validated our stemmer on three
languages which are: Arabic, French and English.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2015), "Information Retrieval: A New Multilingual Stemmer Based on a Statistical Approach", [international] 3rd International Conference on control, engineering & information Technology (CEIT2015), IEEE Conference , Tlemcen, Algeria

Arabic Texts Categorization: Features Selection Based on the Extraction of Words’ Roots

One of methods used to reduce the size of terms vocabulary in Arabic text categorization is to replace the different variants (forms) of words by their common root. The search of root in Arabic or Arabic word root extraction is more difficult than other languages since Arabic language has a very different and difficult structure, that is because it is a very rich language with complex morphology. Many algorithms are proposed in this field. Some of them are based on morphological rules and grammatical patterns, thus they are quite difficult and require deep linguistic knowledge. Others are statistical, so they are less difficult and based only on some calculations. In this paper we propose a new statistical algorithm which permits to extract roots of Arabic words using the technique of n-grams of characters without using any morphological rule or grammatical patterns.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2015), "Arabic Texts Categorization: Features Selection Based on the Extraction of Words’ Roots", [international] 5th IFIP International Conference on Computer Science and its Applications (CIIA’2015) , TaharMoulay University, Saida - Algeria

2014

Contextual Categorization of Documents Using a New Panoply of Similarity Metrics

In this paper, we study the problem of automatic
supervised classification of documents (Documents’
categorization). We propose a new panoply of similarity
metrics which are inspired from the domain of language
identification. We also propose a simple, optimal and effective
method to improve the quality of categorization.
Citation

M. GADRI Said, Abdelouhab Moussaoui, , (2014), "Contextual Categorization of Documents Using a New Panoply of Similarity Metrics", [international] International Conference on Advanced Technology & Sciences (ICAT’14) , Antalya, Turkey.

Language Identification: A New Fast Algorithm to Identify the Language of a Text in a Multilingual Corpus

Identifying the language of a text is a very
important preliminary phase in the categorization of
multilingual documents or even in information retrieval. This
phase becomes difficult if we just consider the word as a basic
unit of information in texts. Because It could be possible for
some languages as French or English but very difficult for
some other languages as German, Chinese and Arabic.
In this paper, we present the most known identification
methods, and we propose a new fast and effective method
based on n-grams of characters. We also evaluate the obtained
results with other methods by adopting the two approaches of
texts segmentation: words approach, n-grams approach.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2014), "Language Identification: A New Fast Algorithm to Identify the Language of a Text in a Multilingual Corpus", [international] The 4thInternational Conference on Multimedia Computing and Systems ICMCS’14, IEEE Conference , University of Marrakesh, Marrakesh, Morocco

Language Identification: Proposition of a New Optimized Variant for the Method of Cavenar and Trenkle

Identifying the language of a text is a very
important preliminary phase in the categorization of
multilingual documents or even in information retrieval. This
phase becomes difficult if we just consider the word as a basic
unit of information in texts. Because It could be possible for
some languages as French or English but very difficult for
some other languages as German, Chinese and Arabic.
In this paper, we present the most known identification
methods, and we propose a new optimized and effective variant
of the method of Cavenar and Trenkle based on n-grams of
characters. We also evaluate the obtained results with other
methods by adopting the two approaches of texts
segmentation: words approach, n-grams approach.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2014), "Language Identification: Proposition of a New Optimized Variant for the Method of Cavenar and Trenkle", [international] International Conference on Artificial Intelligence and Information Technology ICAIIT’14 , University Kasdi Merbah, Ouargla, Algeria

Utilisation des Métriques de l’Identification de la Langue dans la Catégorisation Contextuelle de Documents

Le travail présenté dans ce papier traite le problème
de la classification automatique de documents, il s’agit ici
d’une classification supervisée (catégorisation) puisqu’elle
opère sur un ensemble prédéfini de classes. Nous avons utilisé
une nouvelle approche basée sur les métriques de distance
connues dans le domaine de l’identification de la langue. Nous
avons proposé également une méthode simple et efficace pour
améliorer la qualité de la catégorisation effectuée.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2014), "Utilisation des Métriques de l’Identification de la Langue dans la Catégorisation Contextuelle de Documents", [international] 1st International Symposium on Informatics and its Applications, ISIA 2014 , University of M’sila, M’sila

2013

An Effective Method to Recognize the Language of a Text in a collection of Multilingual Documents,

Identifying the language of a text means that we assign this text to a language in which it is written. This identification becomes important because of the increased diversity of textual data in different languages on the web. A real recognition of the text language is not possible if we just consider the word as a basic unit of information. It could be possible in some languages but very difficult for some other languages. The approach of the segmentation of the text into characteristic n-grams represents a very efficient alternative solution in this field. It also becomes a preferred tool in language acquisition and the extraction of knowledge from texts. In this paper, we present the most known identification methods and we propose a new method based on n-grams of characters. We also evaluate the obtained results with other methods by adopting the two approaches respectively: the segmentation into words and the segmentation into n-grams.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2013), "An Effective Method to Recognize the Language of a Text in a collection of Multilingual Documents,", [international] 10th International Conférence on Electronics, Computer and Computation ICECCO 2013, IEEE conference, , TurgutOzal University, Ankara, Turkey,

Une méthode flexible pour l’identification de la langue d’un texte dans un corpus hétérogène multilingue

Identifying the text language means that we assign this text to a language in which
it is written. This identification became important because of the increased diversity of textual
data in different languages on the web. In addition, a real recognition of the text language is
not possible if we only consider the word as a basic unit of information. It could be possible
for some languages as French or English but very difficult for some other languages as
German or Arabic. The approach of text segmentation into characteristic n-grams represents a
very efficient alternative solution in this field. It also becomes a favorite tool to extract
knowledge from texts.
In this paper, we present the most known identification methods and we propose a new
method based on a new metric of similarity. We also evaluate the obtained results with other
methods while adopting the two approaches respectively : the segmentation of texts into
words and their segmentation into n-grams.
Citation

M. GADRI Said, Abdelouahab Moussaoui, , (2013), "Une méthode flexible pour l’identification de la langue d’un texte dans un corpus hétérogène multilingue", [national] 2ème Conférence nationale des études doctorales en Informatique CNEDI2013, , Université de Sekikda

2006

INTEROPERABILITE DES BASES DE DONNEES HETEROGENES ET REPARTIES

L'interopérabilité des bases de données et des SGBD, ou d'une manière générale entre systèmes
d'informations hétérogènes et répartis, est devenue une nécessité pour répondre aux besoins d'échange
et de communication. elle prend aujourd'hui une large place surtout avec l'interconnexion massive des
systèmes d'informations via Internet et intranet ou extranet .
l'interopérabilité peut être définie par la capacité des systèmes d'informations à se collaborer, même
s'ils ont des natures très différentes, afin de réaliser des fonctionnalités communes .
Notre étude est concentrée sur l'interopérabilité des bases de données hétérogènes et réparties. On
est arrivé à présenter un état de l'art du domaine, à exposer les différentes approches conçues pour
réaliser cette interopérabilité, dont les plus connues sont :
* l'approche de médiation de schémas qui met l'accent sur deux composants fondamentaux : le
médiateur et l'adaptateur .
* L'approche de médiation de contexte orientée sémantique, qui exploite les capacités des ontologies,
des contextes de coopération, et qui peut résoudre la plupart des conflits sémantiques .
* L'approche de fédération qui base essentiellement sur la notion de l'intégration des données suivant
un modèle commun appelé modèle pivot .
* L'approche entrepôt de données orientée beaucoup plus aux besoins des systèmes décisionnels,
surtout au niveau des entreprises qui reçoivent et traitent des flux très importants d'informations .
L'étude comparative sur ces différentes approches a aboutit à un résultat très évident en faveur de
l'approche XML, comme étant un standard très répandu d'échange et d'interopérabilité .
L'étude est terminée par la proposition d'une nouvelle approche qui combine entre les trois
approches, à savoir : l'approche de médiation de schémas, de médiation de contexte, et l'approche XML,
et qui adopte une architecture multi-agents
Citation

M. GADRI Said, (2006), "INTEROPERABILITE DES BASES DE DONNEES HETEROGENES ET REPARTIES", [national] University Mohamed Boudiaf of M'sila

← Back to Researchers List