Skip to main content

Document details - Authorship Identification: Naïve Bayes with XGBoost Approach

Journal Volume 8, Issue 3, May - June 2019, Article 9232092 Dr. B. S. Daga, Jason Dsouza, Ryan Furtado, Manupendra Tiwari , " Authorship Identification: Naïve Bayes with XGBoost Approach" , International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) , Volume 8, Issue 3, May - June 2019 , pp. 001-007 , ISSN 2278 - 6856.

Authorship Identification: Naïve Bayes with XGBoost Approach

    Dr. B. S. Daga, Jason Dsouza, Ryan Furtado, Manupendra Tiwari


Abstract: In today’s world, electronic text is used for communication on a large scale. Most of this content is provided anonymously or under unverified names. For forensic applications, it is important to segregate text into groups of text that may be written by the same individual under a different alias. There are many copyright dispute cases, where multiple people claim the ownership of some content. Authorship identification along with mathematical or statistical analysis of texts could be the key to solve this problem. When an individual writes, they subconsciously use a certain array of words or writing patterns and sentiments, and we could use this to determine their writing style. The fundamental assumption of authorship identification is that each individual has a habit of subconsciously using certain words, patterns and emotions that make their writing style unique. Extraction of these individual features from text could be used to distinguish one author from another. The problem statement for our system is as follows: Building a system that can be trained to recognize a certain individual based on his writing style i.e. the set of words (features) used frequently by the individual. This is also known as generating a writeprint (similar to a fingerprint). With the help of this writeprint the system will be able to identify any other documents or texts which have been written by the same individual. This should help reduce plagiarism in case of authors and can also be used in forensics to identify criminals based on their writing. Keywords: Authorship Identification, Handwriting Analysis, Plagiarism Detection, Writeprint, Feature Extraction.

  • ISSN: 22786856
  • Source Type: Journal
  • Original language: English

Cited by 0 documents

Related documents

{"topic":{"name":"Order Picking; AS/RS; Warehouses","id":5729,"uri":"Topic/5729","prominencePercentile":98.30173,"prominencePercentileString":"98.302","overallScholarlyOutput":0},"dig":"7972b85ca5bc948c1a2f0423f8150b186ec6bb8cf32afac11c4a324b8d78fb11"}