ModelFront docs > ModelFront console docs

ModelFront console docs

How to use the ModelFront console

Running an evaluation in the ModelFront console is an easy way to get quality predictions for a dataset.

A good evaluation helps you assess the quality of translations, for example to estimate post-editing effort, compare multiple translation APIs or custom models. You can also use evaluation to filter or check the quality of human translations, translation memories or parallel corpora.

It’s a few just a few clicks, and requires no code. Just like the API, an evaluation returns segment-level scores and can optionally include translations from the integrated engines.

A single evaluation is for a single language pair, engine and model. To compare languages, engines or models on the same original input segments, just create and run an evaluation for each combination.

Your evaluation is yours and private by default - you can only share it by downloading the file and sharing it.

Creating an evaluation

Name and note

You can always edit Name and Note while an evaluation is running or after it finishes.

Source and target language

The source and target language must not be the same, unless other (und) is selected.


By default, an evaluation requires a parallel data file.

If you only have source text, you can just select an engine to have segments translated by one of the major machine translation APIs like Google, Microsoft, DeepL or ModernMT.

Data file

The data file can contain parallel data - pairs of sentences or other segments in the source language and target language, similar to machine translation training data.

Or, if you don’t have translations, you can just upload a monolingual data and have translations from one of the external engines filled in.

All files should be UTF-8 encoded.

Parallel formats


TMX is an open and standard XML format for importing and exporting translation memories. Segments can contain tags and linebreaks and other control characters.

Only the selected language pair will be extracted. If the file includes multiple variants for that language pair, translations for all variants will be extracted.


A tab-separated-values (TSV) file is a plain-text file where columns are separated by the tab character. Applications like Excel can export a spreadsheet to a TSV and the Okapi framework can convert other file types to TSV.

The TSV file for evaluation must have exactly 2 columns (or 3 columns for accuracy testing) and the name must end in .tsv.

ModelFront supports the Linear TSV standard.

Monolingual formats

.txt, .text, .md, .markdown, .adoc, .se, .html, .xhtml, .align, .src, .trg, .srt

The monolingual file format option is only for evaluations requesting machine translation. It should only include the original segments, and the machine translations will be filled in with the engine you selected.

If your data includes newline characters, consider TSV.


A TSV file with exactly 1 column can also be used. This is useful for data that has control characters like newlines within segments.

File size

Evaluation supports very large files - there is no technical limit. You can evaluate files larger than 1GB with the Google Cloud Storage address option.

Depending on the segment length and our current load across all clients, it takes about 1 hour per million segments. Evaluations that include a request for machine translations take significantly longer, due to the latency of the external translation APIs.


By default, evaluations use our latest default generic model. You can also select a custom model from those that are available to your account.


When an evaluation is finished, ModelFront will send you a notification email, and you’ll be able to preview, share and download a spreadsheet file with the full results.


The quality score is similar to human evaluation or BLEU score - an aggregate score for the whole set, 0 to 100, where higher is better. It’s only meaningfully for evaluations that are large and diverse enough to represent a statistically significant sample.

A ModelFront aggregrate quality score is just the average of the segment-level quality scores, weighted by length of the original source text, including tags. Length-weighting makes the score better reflect actual quality and post-editing effort.

let quality = 0;
let length = 0;
res.rows.forEach(({ quality }, i) => {
  const { original } = req.rows[i];
  quality += quality * original.length;
  length += original.length;
const score = quality / length;

So if the average quality is 90%, the score will be roughly 90.


The chart is a histogram showing the distribution of translations by quality. High quality translations are clustered along the top.

If there are a lot of low-quality translations at the bottom, that’s a sign that there is an issue.

The chart can help you understand the effect of where you set a threshold. How many translations will you keep? What final quality will you get?


The preview shows the translations with the highest predicted quality.


The full results are available as a TMX file or as TSV file. The TSV file has an additional third column with the predicted quality.

Small and medium datasets can be filtered right in the console before downloading.

For working with very large dataset, we recommend downloading as TSV file and provide guidance on common operations.


The download data file is encoded and escaped the same as an upload data file in the TSV format. You may want to unescape control characters when converting it to another format.

If you open a TSV file in a spreadsheet application Microsoft Excel, Apple Numbers or Google Spreadsheets, make sure to change Text Qualifier from " to None, in case some of your segments contain ".

If you open a TSV file in Microsoft Excel or in a Windows application, make sure to select UTF-8 as the file encoding.

You can also work with it on the command line, which is recommended for larger files.


To sort by quality in Bash:

sort -t$'\t' -k3 -n <file.tsv> 

To reverse sort, add -r.


To filter while preserving the order in Bash, for example to get only those with quality above 50%:

awk -F "\t" '{ if($3 > 50) { print }}' <file.tsv>
To just peak at the top or bottom, add ` head -n 100 ` or ` tail -n 100`.
To count the lines, add ` wc -l. To write it out to a file, > `.


To drop the third column with quality scores and keep only the filtered parallel data in Bash, use cut to get the first two columns:

cut -f1 -f2 <file.tsv>


To join multiple eval files with corresponding rows in Bash, use paste.

Accuracy testing

How accurate is ModelFront quality prediction on your data?

If you have post-editing data or labeled data, it’s easy to test and visualize.

We recommend about 500 lines per test. If you have more data, you can split test sets by content type, quality tier, project or data to have multiple smaller accuracy tests.

Post-editing data

Post-editing data consists of originals, machine translations and human translations - similar to quality prediction training data. To start an accuracy test, create an evaluation with a 3-column TSV with a name ending in .edt.tsv.

The Accuracy test will include graphs of metrics like final quality and savings as well as the correlation of predicted quality against post-editing distance.

The preview line will include the human-post-edited translation if it differed from the machine translation.

Labeled data

Labeled data consists of originals, machine translations and human labels. For good translations, the label column should be blank. For bad translations, the label column should include some text, like a label, number or description. To start an accuracy test, create an evaluation with a 3-column TSV with a name ending in .lbl.tsv.

The Accuracy test will show how predicted quality compares to human labels.

© ModelFront Inc.