Data Management

This documentation describes Heartex platform version 1.0.0, which is no longer supported. For information about managing data in Label Studio Enterprise Edition, the equivalent of Heartex platform version 2.0.x, see Label and annotate data.

The project Data Manager page provides a large number of functions for task management and quality control. You can switch to Quick view mode to perform the task exploration faster.

Full View of Data Manager

Quick View of Data Manager

	Enter to Label Stream and start the labeling corresponding to the task sampling order
	Enter to Verify Stream to rate annotator completions
	Import tasks
	Export completions & results
	Ground Truth Manager
	Switch to Quick View / Full View mode using this buttons: in Quick mode you can move among tasks quickly, in Full mode you can see all the statistics and task statuses.
	Clicking on the pencil button opens up the editor with the selected task, completions and ML predictions from all annotators. Use ctrl + click in Quick mode to open the editor in the new tab
	Delete task from project

Label Stream

Tasks are shown by project sampling order in Label Stream mode. This mode is very similar to the annotator labeling work. Completion panel is hidden.

Verify Stream

Verify Stream can be used to flag correct and mistaken completions. You will be prompted to thumb up / thumb down all the project completions by random order. All flags could be found on the data manager page in Review column.

Import Tasks

You can import data through our API or by uploading the JSON/CSV/TSV/ZIP/RAR files. You can always import more data in Data Manager. All text/hypertext resources can be included in tasks directly, they will be hosted on our servers, and advanced hosting is not necessary. For images, audio, time-series, video and other BLOBs you need to use external hosting with https/https links or S3 storage.

Export Completions

So you have been working hard labeling your data and have accumulated a respectable amount. How do you get the data out of the application and onto your computer? The platform provides an export function for this.

The export results are in JSON format. It could be used for Import in Heartex Projects again because of import & export formats are the same (just enable «Include full task descriptions» option). We also support the export at API level.

Aggregation of completions
If you setup «Overlap of completion» for Collaborators in the project settings more than 1 then tasks will have multiple completions. In this case majority vote aggregation can be helpful and it will merge all completions to one.
Include full task descriptions
This option will include full body of tasks to the exported file. Use this option if you want to import the exported file of this project to another project with the proper labeling config.
Include predictions
Include ML predictions, it will be presented as “predictions” array for each task in output JSON.

Tips

If your project has no data labeled and you don’t enable «Include full task descriptions» option, then the download button does nothing and returns empty results.
If your project is not using a model or the requirements for a model have not yet been met, then the downloaded results will only include hand-created labels.
If your project has a ML model, then the downloaded results will include both manual labels and model-assisted labels.
Sometimes the export operation can be long depending on the completion number, so you can start the export and reload data manager page: all your export history will be saved in «Last exports».

Ground Truth Completions

Ground Truth (GT) completions are special items which can be used for:

annotator statistics evaluation relatively to GT completions
machine learning accuracy evaluation relatively to GT completions
retrain ML model including/excluding all GT completions

You can make GTs with several ways:

mark a completion as GT in task explore mode using star icon:
mark a completion in the data manager table:

In this case completion will be selected in priorities of
1 project owner
2 other annotators
import completions marked as GT with tasks
Use GT Manager for the batch marking

Ground Truth Manager

Ground Truth (GT) Manager is a fast way to mark multiple completions as GTs.
Press the green button with star on the data manager page.

The first step is to set the completion filter. The second step is to set the percent of fraction which will be marked as GTs. Also you are able to reset all GT completions to regular completions here.

Filters

Filters implement a classical way to find tasks you are looking for. Combinations of filters are working in intersection mode, e.g.: you can find all completions containing class name Cat completed by annotator heartex@heartex.ai.

The result counters show actual for current page statistics and it will be updated after page reloading in the right mini “Found” panel near “Filters” button. All the icons have tooltips with names.

Task data: filter by substring in specified field
Completion results: filter by classes, types of labeling tags and any other information from “JSON as text” representation of completion.result
Prediction results: very similar to completion results
Collaborator: dropdown with the project annotators
Outliers: find tasks with bad collaborator agreements (less than 33%) or high skipped rates (more 50%)
Flagged regions: show only tasks where completions have flagged regions

← Import & Export Machine Learning →

guide

Platform

Process

People

Various