Introduction
The Similarity API provides an interface to measure the relatedness between labor market entities, such as occupations and skills based on their co-occurrence in job postings. It enables you to analyze career transitions and skill transferability by comparing how closely two entities align within the labor market.
This API is powered by a probability-based machine learning model that uses job postings data to calculate similarity scores between entities. Scores range between 0 and 1, where higher values represent stronger relationships. These values indicate how likely a transition is between two occupations or how transferable a skill is between roles.
Similarity scores are also used as inputs into other Lightcast models such as Defining, Distinguishing & Necessary (DDN) and Career Pathways for advanced workforce analytics.
Note: By default, all clients are allowed a maximum of 5 requests per second. Contact us if you require higher limits.
Methodology
The Similarity model is probability-based and uses different statistical techniques depending on the type of relationship being measured.
Comparison Type | Methodology |
|---|---|
Occupation–Occupation | Calculated using the cosine similarity of the skill vectors for two occupations. A frequency cutoff of 1% is applied to skill vectors to remove low-frequency noise and enhance model accuracy. |
Skill–Occupation | Represents the proportion of a skill’s occurrence within an occupation over the past 12 months. Formula: (Number of vacancies with both the skill and occupation) ÷ (Total number of vacancies mentioning the skill). |
Occupation–Skill | Represents the proportion of vacancies within an occupation that demand a given skill in the past 12 months. Formula: (Number of vacancies with both the occupation and skill) ÷ (Total number of vacancies for the occupation). |
Skill–Skill | Measures how often two skills co-occur or appear in similar contexts within job postings. This is computed using Singular Value Decomposition (SVD) on a postings-by-skill matrix, with scores derived from vector similarity distances. |
Similarities within the same taxonomic class (for example, occupation–occupation) are bi-directional, meaning the similarity between A and B is the same as between B and A.
However, cross-class similarities (for example, occupation–skill or skill–occupation) are directional, because each direction conveys a different relationship. Therefore, the similarity between entity A → B is not necessarily equal to that of B → A, and they should not be used interchangeably.
API Endpoint
There are multiple endpoints within this category. Refer to individual API docs under the Similarity API section.
Request Headers
(*) Indicates the action to be required
All requests must include the following headers:
| Header Key | Value | Description |
|---|---|---|
| *Authorization | Bearer <ACCESS_TOKEN> | This is the token obtained from the Authentication API. Include it in this header to authorize the request. (This must be included in all authenticated requests) |
| Content-Type | application/json | Required for endpoints that accept a request body. Specifies that the body is in JSON format. |
Authentication
All endpoints in this category require an OAuth 2.0 Bearer Token for authentication. Tokens are granted through Lightcast’s Authentication API and are valid for 1 hour.
To access the Similarity API, the token must include the following scope:
similarity(for expanded access)
Sample Request
curl --request POST \
--url https://auth.emsicloud.com/connect/token \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data client_id=CLIENT_ID \
--data client_secret=CLIENT_SECRET \
--data grant_type=client_credentials \
--data scope=similarityRequest Parameters
| Parameter | Type | Description |
|---|---|---|
| client_id | string | The client ID provided by Lightcast. |
| client_secret | string | The client secret associated with your client ID. |
| grant_type | string | Must be set to client_credentials. |
| scope | string | Use similarity based on your access. |
Sample Response
{
"access_token": "<ACCESS_TOKEN>",
"expires_in": 3600,
"token_type": "Bearer"
}Note that the Tokens expire after 3600 seconds. To maintain uninterrupted access, refresh the token before it expires. To know more about this, refer to the Authentication guide.
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| access_token | string | The token used to authorize API requests. |
| expires_in | integer | Number of seconds before the token expires. |
| token_type | string | Indicates the type of token. (Always Bearer) |
Status Code Summary
| Code | Meaning | Description |
|---|---|---|
| 200 | OK | Request was successful. |
| 400 | Bad Request | The request was unacceptable, often due to missing a required parameter. |
| 401 | Unauthorized | Authentication error. |
| 404 | Not Found | Resource not found. |
