Recognition negative comments based on artificial intelligence

Foreword

Artificial intelligence is not just for C/C++. With PHP, you can implement neural networks in your Web applications. To self learning and get experience in artificial intelligence I try project application to recognition negative comments based on artificial intelligence in particular Naive Bayes algorithm. I’m using the wikipedia entry http://en.wikipedia.org/wiki/Bayesian_spam_filtering to develop my classification code. Training data derived from: http://help.sentiment140.com/for-students/

What is it?

grimedetector is a text classification application with a focus on reuse, customizability and performance. Particularly useful in detecting negative (or positive) comments or just texts. Application based on a Naive Bayes statistical classifier.

Introduction to the Bayes Theorem

Naive Bayes classifier is one of the methods of machine learning, used to solve the problem of sorting decision classes. The task Bayes classifier to assign a new case to one of the classes, with their collection must be finite and defined a priori.

Mathematical foundation

Implemented code calculate probability that text is negative given that it contains a specific word by implementing the following formula:

  • Pr(S|W) is the probability that a comment is negative, knowing that the word “replica” is in it;
  • Pr(S) is the overall probability that any given comment is negative;
  • Pr(W|S) is the probability that the word “replica” appears in negative comment;
  • Pr(H) is the overall probability that any given comment is positive;
  • Pr(W|H) is the probability that the word “replica” appears in positive comments.

Implemented code to combine the probabilities of all the unique words in a test comment to determine negative text based on the following formula:

The result p is typically compared to a given threshold to decide whether the comment is negative or not. If p is lower than the threshold, the comment is considered as likely positive, otherwise it is considered as likely negative.

Implementation Naive Bayes Classifier in PHP

class NaiveBayesClassifier
{
    /** @var WordRepository $wordRepository */
    private $wordRepository;

    public function __construct(WordRepository $wordRepository)
    {
        $this->wordRepository = $wordRepository;
    }

    public function classify($words)
    {
        $probabilityProducts = 1;
        $probabilitySums = 1;
        foreach ($words as $word) {
            $probability = $this->wordProbability($word);
            $probabilityProducts *= $probability;
            $probabilitySums *= (1 - $probability);
        }
        $grimeProbability = $probabilityProducts / ($probabilityProducts + $probabilitySums);
        return round($grimeProbability, 2);
    }

    public function wordProbability($word)
    {
        $ps = $this->probabilityContentIsGrime();
        $ph = $this->probabilityContentIsHam();
        $pws = $this->probabilityWordInGrime($word);
        $pwh = $this->probabilityWordInHam($word);
        $psw = ($pws * $ps) / ($pws * $ps + $pwh * $ph);
        $psw = $psw == 1 ? 0.99 : $psw;
        $psw = $psw == 0 ? 0.01 : $psw;
        return $psw;
    }

    public function probabilityContentIsGrime()
    {
        return $this->wordRepository->getGrimeCount() / $this->wordRepository->getWordsCount();
    }

    public function probabilityContentIsHam()
    {
        return $this->wordRepository->getHamCount() / $this->wordRepository->getWordsCount();
    }

    public function probabilityWordInGrime($word)
    {
        /** @var Word $word */
        $word = $this->wordRepository->getWordByName($word);
        if (!$word) {
            return 0.5;
        }
        return $word->getGrimeCount() / $this->wordRepository->getGrimeCount();
    }

    public function probabilityWordInHam($word)
    {
        /** @var Word $word */
        $word = $this->wordRepository->getWordByName($word);
        if (!$word) {
            return 0.5;
        }
        return $word->getHamCount() / $this->wordRepository->getHamCount();
    }
}

Source from: https://github.com/tarnawski/grime-detector-api