Netflix Scalable Annotation Service by CharlesW

Share This Article

Sed ut perspiciatis unde.

At Netflix, we have hundreds of micro services each with its own data models or entities. For example, we have a service that stores a movie entity’s metadata or a service that stores metadata about images. All of these services at a later point want to annotate their objects or entities. Our team, Asset Management Platform, decided to create a generic service called Marken which allows any microservice at Netflix to annotate their entity.

Annotations

Sometimes people describe annotations as tags but that is a limited definition. In Marken, an annotation is a piece of metadata which can be attached to an object from any domain. There are many different kinds of annotations our client applications want to generate. A simple annotation, like below, would describe that a particular movie has violence.

Movie Entity with id 1234 has violence.

But there are more interesting cases where users want to store temporal (time-based) data or spatial data. In Pic 1 below, we have an example of an application which is used by editors to review their work. They want to change the color of gloves to rich black so they want to be able to mark up that area, in this case using a blue circle, and store a comment for it. This is a typical use case for a creative review application.

An example for storing both time and space based data would be an ML algorithm that can identify characters in a frame and wants to store the following for a video

In a particular frame (time)
In some area in image (space)
A character name (annotation data)

Pic 1 : Editors requesting changes by drawing shapes like the blue circle shown above.

Goals for Marken

We wanted to create an annotation service which will have the following goals.

Allows to annotate any entity. Teams should be able to define their data model for annotation.
Annotations can be versioned.
The service should be able to serve real-time, aka UI, applications so CRUD and search operations should be achieved with low latency.
All data should be also available for offline analytics in Hive/Iceberg.

Schema

Since the annotation service would be used by anyone at Netflix we had a need to support different data models for the annotation object. A data model in Marken can be described using schema — just like how we create schemas for database tables etc.

Our team, Asset Management Platform, owns a different service that has a json based DSL to describe the schema of a media asset. We extended this service to also describe the schema of an annotation object.

{
      "type": "BOUNDING_BOX", ❶
      "version": 0, ❷
      "description": "Schema describing a bounding box",
      "keys": {
        "properties": { ❸
          "boundingBox": {
            "type": "bounding_box",
            "mandatory": true
          },
          "boxTimeRange": {
             "type": "time_range",
             "mandatory": true
          }
      }
    }
}

In the above example, the application wants to represent in a video a rectangular area which spans a range of time.

Schema’s name is BOUNDING_BOX
Schemas can have versions. This allows users to make add/remove properties in their data model. We don’t allow incompatible changes, for example, users can not change the data type of a property.
The data stored is represented in the “properties” section. In this case, there are two properties
boundingBox, with type “bounding_box”. This is basically a rectangular area.
boxTimeRange, with type “time_range”. This allows us to specify start and end time for this annotation.

Geometry Objects

To represent spatial data in an annotation we used the Well Known Text (WKT) format. We support following objects

Point
Line
MultiLine
BoundingBox
LinearRing

Our model is extensible allowing us to easily add more geometry objects as needed.

Temporal Objects

Several applications have a requirement to store annotations for videos that have time in it. We allow applications to store time as frame numbers or nanoseconds.

To store data in frames clients must also store frames per second. We call this a SampleData with following components:

sampleNumber aka frame number
sampleNumerator
sampleDenominator

Annotation Object

Just like schema, an annotation object is also represented in JSON. Here is an example of annotation for BOUNDING_BOX which we discussed above.

{  
  "annotationId": { ❶
    "id": "188c5b05-e648-4707-bf85-dada805b8f87",
    "version": "0"
  },
  "associatedId": { ❷
    "entityType": "MOVIE_ID",
    "id": "1234"
  },
  "annotationType": "ANNOTATION_BOUNDINGBOX", ❸
  "annotationTypeVersion": 1,
  "metadata": { ❹
    "fileId": "identityOfSomeFile",
    "boundingBox": {
      "topLeftCoordinates": {
        "x": 20,
        "y": 30
      },
      "bottomRightCoordinates": {
        "x": 40,
        "y": 60
      }
  },
  "boxTimeRange": {
    "startTimeInNanoSec": 566280000000,
    "endTimeInNanoSec": 567680000000
  }
 }
}

The first component is the unique id of this annotation. An annotation is an immutable object so the identity of the annotation always includes a version. Whenever someone updates this annotation we automatically increment its version.
An annotation must be associated with some entity which belongs to some microservice. In this case, this annotation was created for a movie with id “1234”
We then specify the schema type of the annotation. In this case it is BOUNDING_BOX.
Actual data is stored in the metadata section of json. Like we discussed above there is a bounding box and time range in nanoseconds.

Base schemas

Just like in Object Oriented Programming, our schema service allows schemas to be inherited from each other. This allows our clients to create an “is-a-type-of” relationship bet

Netflix Scalable Annotation Service by CharlesW

Netflix Scalable Annotation Service by CharlesW

Share This Article

Newsletter

Annotations

Goals for Marken

Schema

Geometry Objects

Temporal Objects

Annotation Object

Base schemas

HackTech

Leave a comment Cancel reply

Editor's Choice

Netflix Scalable Annotation Service by CharlesW

Netflix Scalable Annotation Service by CharlesW

Share This Article

Newsletter

Annotations

Goals for Marken

Schema

Geometry Objects

Temporal Objects

Annotation Object

Base schemas

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter