VulgarDetector – application to detect vulgar language in text

Foreword

Automatically recognition and flagged as spam comments with vulgar language – it’s possible? How implement application to take care your WordPress website and protect from vulgar comments?

Why this issue?

  • no similar solutions
  • get knowledge of develop WordPress plugin
  • get knowledge of microservices
  • get knowledge of use memcache
  • good introduction to artificial intelligence

The project consists of three parts

  1. Backend – REST API application build on the shoulders of Symfony 3 microframework.
  2. Frontend – simple static page (HTML, CSS, JS, JQuery) presents functionality of application
  3. WordPress plugin – checks comment based on backend application

How application recognize vulgar text

Checking the text is simple and is based on a dictionary of vulgar words, the whole process can be divided into five steps:

  1. Tokenization – process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens
  2. Lowercase tokens – convert uppercase to lowercase
  3. Remove common stopwords – stopword is a commonly used word
  4. Remove duplicates
  5. Search tokens in database

Presentation solutions

  1. BACKEND
    Repository:
    https://github.com/tarnawski/vulgar-detector-api
    Staging:
    http://vulgardetector-api.ttarnawski.usermd.net/status
  2. FRONTEND
    Repository:
    https://github.com/tarnawski/vulgar-detector
    Staging:
    http://vulgardetector.ttarnawski.usermd.net/
  3. WORDPRESS PLUGIN
    Repository:
    https://github.com/tarnawski/vulgar-detector-plugin
    Wordpress Plugin Directory
    https://wordpress.org/plugins/vulgar-detector/

REST API and upload images

Foreword

In some cases you must add images in Your REST API application for example when you want implement possibility to add photo to announcement, event etc. In this case reasonable solution is division endpoint to upload image.

Assumptions

We assumed to create separate endpoint to upload images. One solution is to use a base64, specification might look like this:

POST http://server/data/media
body:
{
      "data": "data:image/jpeg;base64,/9j/4AAQSkZJ..."
}

The response should return the ID of uploaded image to link with other entity.

201 Created
Location: http://server/data/media/21323
body:
{
      "id": 21323
}

Implementation with Symfony

Below service to decode and save image.

class FileUploadService
{
    private $uploadDir;

    public function __construct($uploadDir)
    {
        $this->uploadDir = $uploadDir;
    }

    public function base64Decode($base64)
    {
        return base64_decode($base64);
    }

    public function upload($originalFileName, $image)
    {
        $originalFilePath = sprintf('%s/%s', $this->uploadDir, $originalFileName);

        return file_put_contents($originalFilePath, $image) == false ? false : true;
    }
}

And ImageController

class ImageController
{
    const IMAGE_TYPE = 'jpg';

    public function uploadAction(Request $request)
    {

        $form = $this->createForm(FileType::class);

        $submittedData = json_decode($request->getContent(), true);
        $form->submit($submittedData);

        if (!$form->isValid()) {
            return $this->error($this->getFormErrorsAsArray($form));
        }

        /** @var File $file */
        $file = $form->getData();

        $fileName = sprintf('%s.%s', Uuid::uuid4()->toString(), self::IMAGE_TYPE);

        /** @var FileUploadService $fileUploadService */
        $fileUploadService = $this->get('accessibility_barriers.services.file_upload_service');

        $imageFile = $fileUploadService->base64Decode($file->data);
        $fileUploadService->upload($fileName, $imageFile);

        $image = new Image();
        $image->setName($fileName);

        $em = $this->getDoctrine()->getManager();
        $em->persist($image);
        $em->flush();

        return $this->success($image, 'Image', Response::HTTP_CREATED, array('IMAGE_BASIC'));
    }