Import & Export

This documentation describes Heartex platform version 1.0.0, which is no longer supported. For information about importing data in Label Studio Enterprise Edition, the equivalent of Heartex platform version 2.0.x, see Get data into Label Studio.

How to import tasks?

Prepare you tasks in format described below.
Go to Project - Data Manager - Add more data dialog.

Read more about Import and Export usage through UI in Data Manager.

Task format

Import and export formats are the same.
The platform stores the JSON-formatted list of tasks.
Each task is a dictionary-like structure, with some specific keys reserved for internal use:

id - task identifier, it will be ignored at import and automatically regenerated again.
data - task body is represented as a dictionary {"key": "value"}. It is possible to store any number of key-value pairs within task data, but there should be source keys defined by label config (i.e. what is defined by object tag’s attribute value="$key").
Read more
Depending on the object tag type, field values are interpreted differently:
- <Text value="$key">: value is taken as plain text
- <HyperText value="$key">: value is a HTML markup
- <HyperText value="$key" encoding="base64">: value is a base64 encoded HTML markup
- <Audio value="$key">: value is taken as a valid URL to audio file
- <AudioPlus value="$key">: value is taken as a valid URL to an audio file with CORS policy enabled on the server side
- <Image value="$key">: value is a valid URL to an image file
It’s allowed to miss “data” in the task body for import simplicity. In this case the whole task body is interpreted as task[“data”], i.e. [{"key": "value"}] => [{"data": {"key": "value"}}]

You can explore the task data on task explore page: press on «Input» button in the right bottom corner
completions (optional) - list of output annotation results, see example for more details. You can import annotation results in order to use them in consequent labeling task.
Read more
- id - unique completion identifier
- lead_time - time in seconds spent to create this completion
- ground_truth - mark completion as ground truth (false by default)
- result - completion result data
  - id - unique completion result identifier
  - from_name - name of the tag that was used to label region (control tags)
  - to_name - name of the object tag that provided the region to be labeled (object tags)
  - type - type of the labeling/tag
  - value - tag specific value that includes the labeling result details. The exact structure of value depends on the chosen labeling tag. Explore each tag for more details.
    
    You can explore the result on task explore page: press on «Result» button in the right bottom corner
predictions (optional) - list of machine learning prediction results (aka pre-labeling results). Importing predictions is useful for automatic task prelabeling & active learning & exploration. Follows the same format as completion, with some additional fields related to machine learning inference:
- score - the overall result score (probabilistic output, confidence level, etc.)
- model_version - string field in any format, it will be displayed in the labeling interface.

You may find more extended information about task and completion structure in Import API.

External resources & BLOBs

Images, audio, video, and other external files must be uploaded to any hosting with the http/https access. Your JSON/CSV/TSV task files must contain proper http/https URLs to them. Let’s prepare tasks with images for import as example:

Upload files to any hosting or serve it locally with any web-server.
Copy http/https links to your images

Create tasks.json like this:

[{
  "image_source": "http://example.com/test1.jpg" 
},
{
  "image_source": "http://example.com/test2.jpg" 
}]

Go to Add more data dialog and select the prepared file.

Don’t forget about CORS settings when importing tasks. It must be allowed on external hosting and properly configured. Otherwise task data sources won’t be loaded.

Another option to import external resources is to use Cloud Storages.

Example

Here is an example of a config and tasks list composed of one element, for text classification project:

<View>
  <Text name="message" value="$my_text"/>
  <Choices name="sentiment_class" toName="message">
    <Choice value="Positive"/>
    <Choice value="Neutral"/>
    <Choice value="Negative"/>
  </Choices>
</View>

[{
  # "id" is a reserved field, avoid using it when importing tasks
  "id": 123,

  # "data" requires to contain "my_text" field defined by labeling config,
  # and can optionally include other fields
  "data": {
    "my_text": "Opossum is great",
    "ref_id": 456,
    "meta_info": {
      "timestamp": "2020-03-09 18:15:28.212882",
      "location": "North Pole"
    } 
  },

  # completions are the list of annotation results matched labeling config schema
  "completions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Positive"]
      }
    }]
  }],

  # "predictions" are pretty similar to "completions" 
  # except that they also include some ML related fields like prediction "score"
  "predictions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Neutral"]
      }
    }],
    # score is used for active learning sampling mode
    "score": 0.95
  }]
}]

Import file types

You can download example of the import file for your project in any supported format on Add more data dialog. One file is limited with 250k tasks and 200 MB size.

JSON - as described in Task format.
CSV / TSV - when CSV / TSV formatted text file is used, column names are interpreted as task data keys:
```
my_text,optional_field
this is a first task,123
this is a second task,456
```
TXT - in a typical scenario, you may use only one input data stream (or in other words only one object tag specified in label config). In this case, you don’t need to use JSON format, but simply write down your values in a plain text file, line by line, e.g.
```
this is a first task
this is a second task
```
ZIP archives with JSONs, CSV, TSV
RAR archives with JSONs, CSV, TSV

Supported image formats

.png .jpg .jpeg .tiff .bmp .gif

Supported audio formats

.wav .aiff .mp3 .au .flac

Quick API overview

Read more about task import API and full task API section.

Import tasks API

curl -H 'Content-Type: application/json' -H 'Authorization: Token abc123' \
-X POST 'https://app.heartex.ai/api/projects/1/tasks/bulk/' --data @my_file.csv

where my_file.csv is

[{
  "data": {
    "my_image_url": "https://app.heartex.ai/static/samples/kittens1.jpg"
  }
}, {
  "data": {
    "my_image_url": "https://app.heartex.ai/static/samples/kittens2.jpg"
}}]

Retrieve task API

The task format could be viewed by following this link in your browser (change <task_id> for the real task ID, e.g. 2353):

curl https://app.heartex.ai/api/tasks/<task_id>/

The following format specifies a Task:

{
  "id": 2353,
  "data": {
    "my_image_url": "https://app.heartex.ai/static/samples/kittens.jpg"
  },
  "accuracy": 0.0,
  "created_at": "2019-02-04T20:33:51.361394Z",
  "updated_at": "2019-02-04T20:33:51.361430Z",
  "is_labeled": false,
  "project": 2
}

Export results API

You can use an API to request a file with exported results.
Read more in API.

Import from Cloud Storage

It is possible to import your data directly from a cloud storage (e.g. AWS S3 bucket). Read more in Cloud Storages section.

Export to common formats

You can optionally convert and export json raw completions to a more common formats by applying an open source converter tool.
The following export formats are available depending on a chosen annotation type:

JSON_MIN - minified version of json raw completions
CSV / TSV
CONLL2003 - popular format used for CoNLL-2003 named entity recognition challenge
COCO - popular machine learning format used by COCO dataset for object detection and image segmentation tasks
Pascal VOC XML - popular XML-formatted task data used for object detection and image segmentation tasks
Brush Labels to Numpy & PNG - export your brush labels to numpy 2d arrays and PNG images. One label is equal to one image.

← Projects Data Management →

guide

Platform

Process

People

Various