• Human vs machine - A multilingual analysis of writing

     


    This project is my dissertation project.

    Generative AI like ChatGPT, GPT-4, and Google Bard (PALM2) are reshaping various sectors by mimicking human-like responses, blurring the line between AI-generated and human content.

    This shift has led to reliance on these AI models for various tasks such as academic writing, fake reviews, misleading news, and social media posts worldwide. To tackle this, multilingual models have emerged to distinguish between human and AI-generated text. However, most prior studies focused primarily on English, with limited testing on other languages like Japanese, German, and Hindi.

    Seven models and a perplexity-based method were analyzed, with five models consistently tested across multiple languages. The absence of comprehensive datasets for these languages required new dataset development. Generally, these models performed well when trained and tested on diverse topics but struggled when exposed to single-topic datasets, particularly RoBERTa.

     Misclassifications occurred, with some machine-generated texts being labeled as human-written. BERT showed better overall performance in languages like German and English, while XLM-RoBERTa and DistilBERT-Multilingual excelled with Hindi texts. Perplexity-based methods like GPTZero effectively differentiated between human and machine-generated English texts, suggesting the use of watermarking algorithms by language models. Models specifically pretrained on languages like HindiBERT and BERTJapanese accurately classified human-written and machine-generated text in their respective languages.

    For more details visit:

    https://github.com/surajjeoor/Human_vs_machine_Analysis


  • 0 comments:

    Post a Comment

    GET A FREE QUOTE NOW

    Inquiries about anything, feel free to rach out

    ADDRESS

    Flat 1, 17A, Wallace Street, Stirling, UK, FK8 1NS

    EMAIL

    suj00014@students.stir.ac.uk
    sjeoor@outlook.com

    TELEPHONE

    +44 7776835527