An Examination of Image Enhancements for Extracting Text from Images to Detect Cyberbullying

Latha Saradha, Northeastern Illinois University
Sarah DeZetter, Northeastern Illinois University
Salah Latif, Northeastern Illinois University

Rachel Adler and Rachel Trana are the faculty sponsors of this project.

Description

Cyberbullying is a form of harassment that occurs through the use of electronic means. Social networking site users, particularly teenagers, are often victims of these attacks. Prior research on cyberbullying primarily focuses on examining text-based comments from social networking sites to determine whether they are considered cyberbullying. However, many images can contain harmful text as well. Our research concentrates on extracting text from images posted on social networking sites in order to detect and classify those that contain cyberbullying. Text extraction from images is an important research problem in image processing and this work has many diverse applications, such as for the blind or visually impaired. Our research focuses on ideal algorithms for extracting text from social networking site images in order to prevent harmful content from being posted. Optical Character Recognition (OCR) software can be used to extract text from images. However, this can prove challenging as some images can have more complex backgrounds and patterns. In order to address this, we have examined multiple methods for enhancing and manipulating images before text extraction, such as grayscale manipulations, dilation, erosion, angle corrections, adaptive thresholding, and removing noise in order to best retrieve the text. We have tested our algorithms on images with varying degrees of complexity in terms of background, colors, and text styles. Rather than relying on one technique, we tested our code on different combinations and ordering of image manipulations. Our results report on better combinations of image manipulations which led to greater overall accuracy for all images. Future work will examine ideal combinations for different types of images. We plan to categorize images and identify the sequence of manipulations that should be implemented for those sets of images. The goal of this research project is to create a tool that social networking sites could use to extract text from images or memes in order to detect cyberbullying. Automatic flagging of this content on social networking sites will not only help victims of cyberbullying, but can be useful for those posting messages to recognize cyberbullying as many may be unaware of the harmful ramifications of their words.

 
Jan 1st, 12:00 AM

An Examination of Image Enhancements for Extracting Text from Images to Detect Cyberbullying

Cyberbullying is a form of harassment that occurs through the use of electronic means. Social networking site users, particularly teenagers, are often victims of these attacks. Prior research on cyberbullying primarily focuses on examining text-based comments from social networking sites to determine whether they are considered cyberbullying. However, many images can contain harmful text as well. Our research concentrates on extracting text from images posted on social networking sites in order to detect and classify those that contain cyberbullying. Text extraction from images is an important research problem in image processing and this work has many diverse applications, such as for the blind or visually impaired. Our research focuses on ideal algorithms for extracting text from social networking site images in order to prevent harmful content from being posted. Optical Character Recognition (OCR) software can be used to extract text from images. However, this can prove challenging as some images can have more complex backgrounds and patterns. In order to address this, we have examined multiple methods for enhancing and manipulating images before text extraction, such as grayscale manipulations, dilation, erosion, angle corrections, adaptive thresholding, and removing noise in order to best retrieve the text. We have tested our algorithms on images with varying degrees of complexity in terms of background, colors, and text styles. Rather than relying on one technique, we tested our code on different combinations and ordering of image manipulations. Our results report on better combinations of image manipulations which led to greater overall accuracy for all images. Future work will examine ideal combinations for different types of images. We plan to categorize images and identify the sequence of manipulations that should be implemented for those sets of images. The goal of this research project is to create a tool that social networking sites could use to extract text from images or memes in order to detect cyberbullying. Automatic flagging of this content on social networking sites will not only help victims of cyberbullying, but can be useful for those posting messages to recognize cyberbullying as many may be unaware of the harmful ramifications of their words.