-
Guide
Tags
API
What's new
guide
- Introduction
- FAQ
- Vocabulary
Platform
- Projects
- Import & Export
- Data Management
- Labeling Configuration
- Labeling Interface
- Machine Learning
Process
- Statistics
- Machine Learning Backends
- Verify and Monitor Quality
People
- User Accounts
- Guide for Annotators
- Organizations
- Teams
Various
- Activity Log
- JavaScript SDK
- Embed Annotation
- On-Premise Setup
- On-Premise Usage
Import & Export
This documentation describes Heartex platform version 1.0.0, which is no longer supported. For information about importing data in Label Studio Enterprise Edition, the equivalent of Heartex platform version 2.0.x, see Get data into Label Studio.
How to import tasks?
- Prepare you tasks in format described below.
- Go to Project - Data Manager - Add more data dialog.
Read more about Import and Export usage through UI in Data Manager.
Task format
Import and export formats are the same.
The platform stores the JSON-formatted list of tasks.
Each task is a dictionary-like structure, with some specific keys reserved for internal use:
id - task identifier, it will be ignored at import and automatically regenerated again.
data - task body is represented as a dictionary
{"key": "value"}
. It is possible to store any number of key-value pairs within task data, but there should be source keys defined by label config (i.e. what is defined by object tag’s attributevalue="$key"
).Read more
Depending on the object tag type, field values are interpreted differently:
<Text value="$key">
:value
is taken as plain text<HyperText value="$key">
:value
is a HTML markup<HyperText value="$key" encoding="base64">
:value
is a base64 encoded HTML markup<Audio value="$key">
:value
is taken as a valid URL to audio file<AudioPlus value="$key">
:value
is taken as a valid URL to an audio file with CORS policy enabled on the server side<Image value="$key">
:value
is a valid URL to an image file
It’s allowed to miss “data” in the task body for import simplicity. In this case the whole task body is interpreted as task[“data”], i.e.
[{"key": "value"}] => [{"data": {"key": "value"}}]
You can explore the task data on task explore page: press on «Input» button in the right bottom corner
completions (optional) - list of output annotation results, see example for more details. You can import annotation results in order to use them in consequent labeling task.
Read more
- id - unique completion identifier
- lead_time - time in seconds spent to create this completion
- ground_truth - mark completion as ground truth (false by default)
- result - completion result data
- id - unique completion result identifier
- from_name - name of the tag that was used to label region (control tags)
- to_name - name of the object tag that provided the region to be labeled (object tags)
- type - type of the labeling/tag
- value - tag specific value that includes the labeling result details. The exact structure of value depends on the chosen labeling tag. Explore each tag for more details.
You can explore the result on task explore page: press on «Result» button in the right bottom corner
predictions (optional) - list of machine learning prediction results (aka pre-labeling results). Importing predictions is useful for automatic task prelabeling & active learning & exploration. Follows the same format as completion, with some additional fields related to machine learning inference:
- score - the overall result score (probabilistic output, confidence level, etc.)
- model_version - string field in any format, it will be displayed in the labeling interface.
You may find more extended information about task and completion structure in Import API.
External resources & BLOBs
Images, audio, video, and other external files must be uploaded to any hosting with the http/https access. Your JSON/CSV/TSV task files must contain proper http/https URLs to them. Let’s prepare tasks with images for import as example:
Upload files to any hosting or serve it locally with any web-server.
Copy http/https links to your images
Create
tasks.json
like this:[{ "image_source": "http://example.com/test1.jpg" }, { "image_source": "http://example.com/test2.jpg" }]
Go to Add more data dialog and select the prepared file.
Don’t forget about CORS settings when importing tasks. It must be allowed on external hosting and properly configured. Otherwise task data sources won’t be loaded.
Another option to import external resources is to use Cloud Storages.
Example
Here is an example of a config and tasks list composed of one element, for text classification project:
<View>
<Text name="message" value="$my_text"/>
<Choices name="sentiment_class" toName="message">
<Choice value="Positive"/>
<Choice value="Neutral"/>
<Choice value="Negative"/>
</Choices>
</View>
[{
# "id" is a reserved field, avoid using it when importing tasks
"id": 123,
# "data" requires to contain "my_text" field defined by labeling config,
# and can optionally include other fields
"data": {
"my_text": "Opossum is great",
"ref_id": 456,
"meta_info": {
"timestamp": "2020-03-09 18:15:28.212882",
"location": "North Pole"
}
},
# completions are the list of annotation results matched labeling config schema
"completions": [{
"result": [{
"from_name": "sentiment_class",
"to_name": "message",
"type": "choices",
"value": {
"choices": ["Positive"]
}
}]
}],
# "predictions" are pretty similar to "completions"
# except that they also include some ML related fields like prediction "score"
"predictions": [{
"result": [{
"from_name": "sentiment_class",
"to_name": "message",
"type": "choices",
"value": {
"choices": ["Neutral"]
}
}],
# score is used for active learning sampling mode
"score": 0.95
}]
}]
Import file types
You can download example of the import file for your project in any supported format on Add more data dialog. One file is limited with 250k tasks and 200 MB size.
JSON - as described in Task format.
CSV / TSV - when CSV / TSV formatted text file is used, column names are interpreted as task data keys:
my_text,optional_field this is a first task,123 this is a second task,456
TXT - in a typical scenario, you may use only one input data stream (or in other words only one object tag specified in label config). In this case, you don’t need to use JSON format, but simply write down your values in a plain text file, line by line, e.g.
this is a first task this is a second task
ZIP archives with JSONs, CSV, TSV
RAR archives with JSONs, CSV, TSV
Supported image formats
.png .jpg .jpeg .tiff .bmp .gif
Supported audio formats
.wav .aiff .mp3 .au .flac
Quick API overview
Read more about task import API and full task API section.
Import tasks API
curl -H 'Content-Type: application/json' -H 'Authorization: Token abc123' \
-X POST 'https://app.heartex.ai/api/projects/1/tasks/bulk/' --data @my_file.csv
where my_file.csv
is
[{
"data": {
"my_image_url": "https://app.heartex.ai/static/samples/kittens1.jpg"
}
}, {
"data": {
"my_image_url": "https://app.heartex.ai/static/samples/kittens2.jpg"
}}]
Retrieve task API
The task format could be viewed by following this link in your browser (change <task_id>
for the real task ID, e.g. 2353
):
curl https://app.heartex.ai/api/tasks/<task_id>/
The following format specifies a Task:
{
"id": 2353,
"data": {
"my_image_url": "https://app.heartex.ai/static/samples/kittens.jpg"
},
"accuracy": 0.0,
"created_at": "2019-02-04T20:33:51.361394Z",
"updated_at": "2019-02-04T20:33:51.361430Z",
"is_labeled": false,
"project": 2
}
Export results API
You can use an API to request a file with exported results.
Read more in API.
Import from Cloud Storage
It is possible to import your data directly from a cloud storage (e.g. AWS S3 bucket). Read more in Cloud Storages section.
Export to common formats
You can optionally convert and export json raw completions to a more common formats by applying an open source converter tool.
The following export formats are available depending on a chosen annotation type:
- JSON_MIN - minified version of json raw completions
- CSV / TSV
- CONLL2003 - popular format used for CoNLL-2003 named entity recognition challenge
- COCO - popular machine learning format used by COCO dataset for object detection and image segmentation tasks
- Pascal VOC XML - popular XML-formatted task data used for object detection and image segmentation tasks
- Brush Labels to Numpy & PNG - export your brush labels to numpy 2d arrays and PNG images. One label is equal to one image.