ModelFront docs > ModelFront console docs
Running an evaluation in the ModelFront console is an easy way to get quality predictions for a dataset.
A good evaluation helps you assess the quality of translations, for example to estimate post-editing effort, compare multiple translation APIs or custom models. You can also use evaluation to check the quality of human translations, translation memories or parallel corpora.
It’s a few just a few clicks, and requires no code. Just like the API, an evaluation returns segment-level scores and can optionally include translations from the integrated engines.
A single evaluation is for a single language pair, engine and model. To compare languages, engines or models on the same original input segments, just create and run an evaluation for each combination.
Your evaluation is yours and private by default - you can only share it by downloading the file and sharing it.
You can always edit Name and Note while an evaluation is running or after it finishes.
The source and target language must not be the same, unless other (und
) is selected.
By default, an evaluation requires a parallel data file.
If you only have source text, you can just select an engine to have segments translated by one of the major machine translation APIs like Google, Microsoft, DeepL or ModernMT.
The data file can contain parallel data - pairs of sentences or other segments in the source language and target language, similar to machine translation training data.
Or, if you don’t have translations, you can just upload a monolingual data and have translations from one of the external engines filled in.
All files should be UTF-8 encoded.
TMX is an open and standard XML format for importing and exporting translation memories. Segments can contain tags and linebreaks and other control characters.
Only the selected language pair will be extracted. If the file includes multiple variants for that language pair, translations for all variants will be extracted.
A tab-separated-values (TSV) file is a plain-text file where columns are separated by the tab character. Applications like Excel can export a spreadsheet to a TSV and the Okapi framework can convert other file types to TSV.
The TSV file for evaluation must have exactly 2 columns (or 3 columns for accuracy testing) and the name must end in .tsv
.
ModelFront supports the Linear TSV standard.
The monolingual file format option is only for evaluations requesting machine translation. It should only include the original segments, and the machine translations will be filled in with the engine you selected.
If your data includes newline characters, consider TSV.
A TSV file with exactly 1 column can also be used. This is useful for data that has control characters like newlines within segments.
By default, evaluations use our latest default generic model. You can also select a custom model from those that are available to your account.
If you select an automatic post-editing model, the edit will be generated for each segment alongside with the quality score.
When an evaluation is finished, ModelFront will send you a notification email, and you’ll be able to preview, share and download a spreadsheet file with the full results.
The quality score is similar to human evaluation or BLEU score - an aggregate score for the whole set, 0 to 100, where higher is better. It’s only meaningfully for evaluations that are large and diverse enough to represent a statistically significant sample.
A ModelFront aggregrate quality score is just the average of the segment-level quality scores, weighted by length of the original source text, including tags. Length-weighting makes the score better reflect actual quality and post-editing effort.
let quality = 0;
let length = 0;
res.rows.forEach(({ quality }, i) => {
const { original } = req.rows[i];
quality += quality * original.length;
length += original.length;
});
const score = quality / length;
So if the average quality is 90%, the score will be roughly 90.
The chart is a histogram showing the distribution of translations by quality. High quality translations are clustered along the top.
If there are a lot of low-quality translations at the bottom, that’s a sign that there is an issue.
The chart can help you understand the effect of where you set a threshold. How many translations will you keep? What final quality will you get?
The preview shows the translations with the highest predicted quality.
The full results are available as a TSV file with the following columns:
Source | Translation 1 | Approve 2 | Score 3 |
---|---|---|---|
It’s a good day today | es un buen día hoy | TRUE | 84.00 |
It’s a good day today | Es un buen día hoy | TRUE | 96.00 |
It’s a good day today | Es un mal día hoy | FALSE | 30.00 |
2 TRUE if the text in translation column is approved.
3 Score of the translation / automatic post-edit.
The download data file is encoded and escaped the same as an upload data file in the TSV format. You may want to unescape control characters when converting it to another format.
If you open a TSV file in a spreadsheet application Microsoft Excel, Apple Numbers or Google Spreadsheets, make sure to change Text Qualifier from "
to None
, in case some of your segments contain "
.
If you open a TSV file in Microsoft Excel or in a Windows application, make sure to select UTF-8 as the file encoding.
How accurate is ModelFront quality prediction on your data?
If you have post-editing data, it’s easy to test and visualize.
We recommend about 500 lines per test. If you have more data, you can split test sets by content type, quality tier, project or data to have multiple smaller accuracy tests.
Post-editing data consists of originals, machine translations and human translations - similar to quality prediction training data.
To start an accuracy test, create an evaluation with a 3-column TSV with a name ending in .edt.tsv
.
The Accuracy test will include graphs of metrics like final quality and savings as well as the correlation of predicted quality against post-editing distance.
The preview line will include the human-post-edited translation if it differed from the machine translation.
The full results are available as a TSV file with the following columns:
Source | Translation 1 | Approve 2 | Score 3 | Hybrid output 4 | Human output 5 |
---|---|---|---|---|---|
It’s a good day today | es un buen día hoy | TRUE | 84.00 | es un buen día hoy | es un buen día hoy |
It’s a good day today | Es un buen día hoy | TRUE | 96.00 | Es un buen día hoy | Es un buen día hoy |
It’s a good day today | Es un buen día hoy | FALSE | 30.00 | Es un buen día hoy | Es un buen día hoy |
2 TRUE if the text in translation column is approved.
3 Score of the translation / automatic post-edit.
4 Text from translation column if approved, otherwise text from human column.