Overview

Introduction

The Similarity API provides an interface to measure the relatedness between labor market entities, such as occupations and skills based on their co-occurrence in job postings. It enables you to analyze career transitions and skill transferability by comparing how closely two entities align within the labor market.

This API is powered by a probability-based machine learning model that uses job postings data to calculate similarity scores between entities. Scores range between 0 and 1, where higher values represent stronger relationships. These values indicate how likely a transition is between two occupations or how transferable a skill is between roles.

Similarity scores are also used as inputs into other Lightcast models such as Defining, Distinguishing & Necessary (DDN) and Career Pathways for advanced workforce analytics.

Note: By default, all clients are allowed a maximum of 5 requests per second. Contact us if you require higher limits.

Methodology

The Similarity model is probability-based and uses different statistical techniques depending on the type of relationship being measured.

Comparison Type

Methodology

Occupation–Occupation

Calculated using the cosine similarity of the skill vectors for two occupations. A frequency cutoff of 1% is applied to skill vectors to remove low-frequency noise and enhance model accuracy.

Skill–Occupation

Represents the proportion of a skill’s occurrence within an occupation over the past 12 months. Formula: (Number of vacancies with both the skill and occupation) ÷ (Total number of vacancies mentioning the skill).

Occupation–Skill

Represents the proportion of vacancies within an occupation that demand a given skill in the past 12 months. Formula: (Number of vacancies with both the occupation and skill) ÷ (Total number of vacancies for the occupation).

Skill–Skill

Measures how often two skills co-occur or appear in similar contexts within job postings. This is computed using Singular Value Decomposition (SVD) on a postings-by-skill matrix, with scores derived from vector similarity distances.

Similarities within the same taxonomic class (for example, occupation–occupation) are bi-directional, meaning the similarity between A and B is the same as between B and A.

However, cross-class similarities (for example, occupation–skill or skill–occupation) are directional, because each direction conveys a different relationship. Therefore, the similarity between entity A → B is not necessarily equal to that of B → A, and they should not be used interchangeably.

API Endpoint

There are multiple endpoints within this category. Refer to individual API docs under the Similarity API section.

Request Headers

(*) Indicates the action to be required

All requests must include the following headers:

Header KeyValueDescription
*AuthorizationBearer <ACCESS_TOKEN>This is the token obtained from the Authentication API. Include it in this header to authorize the request. (This must be included in all authenticated requests)
Content-Typeapplication/jsonRequired for endpoints that accept a request body. Specifies that the body is in JSON format.

Authentication

All endpoints in this category require an OAuth 2.0 Bearer Token for authentication. Tokens are granted through Lightcast’s Authentication API and are valid for 1 hour.

To access the Similarity API, the token must include the following scope:

  • similarity (for expanded access)

Sample Request

curl --request POST \
  --url https://auth.emsicloud.com/connect/token \
  --header 'Content-Type: application/x-www-form-urlencoded' \
  --data client_id=CLIENT_ID \
  --data client_secret=CLIENT_SECRET \
  --data grant_type=client_credentials \
  --data scope=similarity

Request Parameters

ParameterTypeDescription
client_idstringThe client ID provided by Lightcast.
client_secretstringThe client secret associated with your client ID.
grant_typestringMust be set to client_credentials.
scopestringUse similarity based on your access.

Sample Response

{
  "access_token": "<ACCESS_TOKEN>",
  "expires_in": 3600,
  "token_type": "Bearer"
}

Note that the Tokens expire after 3600 seconds. To maintain uninterrupted access, refresh the token before it expires. To know more about this, refer to the Authentication guide.

Response Parameters

ParameterTypeDescription
access_tokenstringThe token used to authorize API requests.
expires_inintegerNumber of seconds before the token expires.
token_typestringIndicates the type of token. (Always Bearer)

Status Code Summary

CodeMeaningDescription
200OKRequest was successful.
400Bad RequestThe request was unacceptable, often due to missing a required parameter.
401UnauthorizedAuthentication error.
404Not FoundResource not found.