Naive Bayes Classification

April 3, 2022 2 분 소요

Naive Bayes Classification

Fundamental Concept

The Naive Bayes classification is one of the algorithms for predicting results based on probability.
Calculate the probability of which label the data belongs and classify the data into the label with the highest probability.

Algorithm

Data Preprocessing

When dealing with the classification problem of natural language processing into the Naive Bayes classification, input data must be converted into a vector composed of features.
The preprocessing in this example is to convert the article title to BoW (Bag of Words) and then create a vector consisting of feature and label pairs.
First, only nouns are extracted from the article title of the learning data. At this time, the word order is ignored and treated as a set. |학습데이터|카테고리| |:—:|:—:| |{“감동”, “명작”, “영화”}|영화| |{“화려”, “액션”, “영화”}|영화| |{“명작”, “세계”, “감동”}|영화| |{“모래 폭풍”, “화성”}|우주| |{“화성”, “탐사”, “재개”}|우주| |{“VR”, “탐사”, “재개”}|우주|

Second, Change the word set and category of the feature to be easy to handle.
When the learning data includes words in a set of words, add 1 and 0 when not included.

학습데이터	명작	영화	화려	액션	세계	감동	모래 폭풍	화성	탐사	재개	VR	카테고리
감동을 준 명작 영화가 부활	1	1	0	0	0	1	0	0	0	0	0	1
화려한 액션 영화가 개봉	0	1	1	1	0	0	0	0	0	0	0	1
명작의 부활에 세계가 감동	1	0	0	0	1	1	0	0	0	0	0	1
모래 폭풍이 화성을 덮다	0	0	0	0	0	0	1	1	0	0	0	0
마침내 화성 탐사 재개	0	0	0	0	0	0	0	1	1	1	0	0
VR로 보는 화성의 모래 폭풍과 감동	0	0	0	0	0	1	0	1	1	0	1	0

As shown in the table above, a pair of features and labels is called BoW depending on whether there is a word in a natural language sentence.

Third, The test data is also converted to BoW.

학습데이터	명작	영화	화려	액션	세계	감동	모래 폭풍	화성	탐사	재개	VR	카테고리
부활한 명작에서 보여주는 액션에 감동	1	0	0	1	0	1	0	0	0	0	0	??

Calculate probability

When learning by the Naive Bayes classification, two types of probabilities are calculated.
- a. probability of each label appearing
- b. conditional probability of each word appearing on each label
- calculate a * b

Sample Code

from sklearn.naive_bayes import MultinomialNB

X_train = [[1,1,0,0,0,1,0,0,0,0,0],
          [0,1,1,1,0,0,0,0,0,0,0],
          [1,0,0,0,1,1,0,0,0,0,0],
          [0,0,0,0,0,0,1,1,0,0,0],
          [0,0,0,0,0,0,0,1,1,1,0],
          [0,0,0,0,0,1,0,1,1,0,1]]

y_train = [1,1,1,0,0,0]

model = MultinomialNB()
model.fit(X_train, y_train)
model.predict([[1,0,0,1,0,1,0,0,0,0,0]])

array([1])

참고문헌

秋庭伸也 et al. 머신러닝 도감 : 그림으로 공부하는 머신러닝 알고리즘 17 / 아키바 신야, 스기야마 아세이, 데라다 마나부 [공] 지음 ; 이중민 옮김, 2019.

Twitter Facebook LinkedIn

Juyeong Shin

Naive Bayes Classification

Naive Bayes Classification

Fundamental Concept

Algorithm

Data Preprocessing

Calculate probability

Sample Code

참고문헌

공유하기

댓글남기기

참고

Tree-KG: An Expandable Knowledge Graph Construction Framework for Knowledge-intensive Domains

✨️Going Beyond Local: Global Graph-Enhanced Personalized News Recommendation

LLM, RAG, KG, RecSys 트렌드 리뷰

사고 실험보단 컴퓨터공학, 세상의 이치로 풀어써본 P, NP, NP-난해, NP-완전