Skip to content

Benjamin Marie's Blog

Analysis About AI, Natural Language Processing, and Machine Translation

Benjamin Marie's Blog

Analysis About AI, Natural Language Processing, and Machine Translation

  • Blog
  • About
  • Contact
    • Home
    • Evaluation
Evaluation Scientific credibility

Do Bigger Evaluation Datasets Make Your Results More Significant?

May 13, 2023

The size of the test set shouldn’t have any impact on the evaluation, provided that the test set has been correctly created. Increasing its size shouldn’t change the p-value of…

Evaluation Machine translation Scientific credibility

Scientific Credibility in Machine Translation Research: Pitfalls and Promising Trends

May 11, 2023

Are we at a turning point? My conclusions from the annotation of 1,000+ scientific papers

Evaluation Machine translation

Traditional Versus Neural Metrics for Machine Translation Evaluation

Mar 9, 2023

Since 2010, 100+ automatic metrics have been proposed to improve machine translation evaluation. In this article, I present the most popular metrics that are used as alternatives, or in addition,…

Evaluation GPT LLM Machine translation

Translate with ChatGPT

Feb 16, 2023

A very robust machine translation system.

Evaluation Machine translation

12 Critical Flaws of BLEU

Dec 12, 2022

BLEU is an extremely popular evaluation metric for AI. It was originally proposed 20 years ago for machine translation evaluation, but it is nowadays commonly used in many natural language processing (NLP)…

Evaluation LLM Machine translation

How Good Is Google PaLM at Translation?

Dec 2, 2022

But how good is PaLM at translation compared to the standard machine translation encoder-decoder approach?

Evaluation Machine translation Scientific credibility

BLEU: A Misunderstood Metric from Another Age

Nov 5, 2022

In this article, we will go back 20 years ago to expose the main reasons that brought BLEU to existence and made it a very successful metric. We will look…

Evaluation LLM Machine translation Scientific credibility

Why the Evaluation of OpenAI Whisper Is Not Entirely Credible

Oct 31, 2022

Whisper is evaluated on 6 tasks (section 3 of the research paper). I demonstrate that the conclusions drawn from 3 of these evaluation tasks are flawed ❌ or misleading ❌.

Evaluation Machine translation

We Need Statistical Significance Testing in Machine Translation Evaluation

Oct 27, 2022

A rule of thumb may yield correct results but can’t be scientifically credible. Illustration by the author. Take any research paper or blog post presenting a new method for AI,…

Conference Evaluation Machine translation

A Large-Scale Automatic Evaluation of Machine Translation

Sep 29, 2022

Like every year since 2006, the Conference on Machine Translation (WMT) organized extensive machine translation shared tasks. Numerous participants from all over the world submitted their machine translation (MT) outputs…

Posts navigation

1 2

Next Page »

About the author:
Ph.D, research scientist in NLP/AI.
Advocate of the scientific credibility.
Building next-gen AI translation systems: https://slaitor.com

  • Conference
  • Evaluation
  • Framework/Tool
  • GPT
  • LLM
  • Machine translation
  • Scientific credibility

You Missed

Evaluation Scientific credibility

Do Bigger Evaluation Datasets Make Your Results More Significant?

Evaluation Machine translation Scientific credibility

Scientific Credibility in Machine Translation Research: Pitfalls and Promising Trends

Machine translation GPT LLM

AI Won’t Replace Translators

Evaluation Machine translation

Traditional Versus Neural Metrics for Machine Translation Evaluation

Benjamin Marie's Blog

Analysis About AI, Natural Language Processing, and Machine Translation

Copyright © All rights reserved | Blogus by Themeansar.