Concepts
Agnitio is organized around the concept of a dataset. A dataset encapsulates what has been measured and how it has been described.
- The “what” is called metrics (e.g., jobs, earnings).
- The “how” is called dimensions (e.g., occupation, industry, area).
Example: US Industry Dataset
Dimensions:
- Area – Nation, states, and counties
- Industry – Industries as defined by a recent version of the NAICS standard
- Class of Worker – Employed versus self-employed
Metrics:
- Jobs – Average number of jobs over the year
- Earnings – Total earnings for the year
- Establishments – Average number of establishments
This dataset allows querying the total (sum) number of jobs in a particular area (or group of areas) for all or selected industries and employment classes.
Time is treated as a special dimension that combines with metrics. For a given combination of areas, industries, and classes, you can request multiple years of data in a single call.
1. Dataset Versioning
Datasets are versioned, generally by time. Lightcast produces regular releases for its core datasets, including both historical and projected data.
- For example, the
2023.3
and2023.4
versions of the US Industry dataset cover the same years, but2023.4
may include updated sources or adjusted estimates. - API consumers are recommended to use the latest version, unless working on a long-term project that requires consistent data.
The latest version of any dataset can be found by querying the /meta/<dataset>
endpoint.
2. Hierarchies
Dimensions are organized as hierarchical taxonomies. For example, Lightcast's standard U.S. Area dimension follows the FIPS (Federal Information Processing Standards) system for states and counties:
- Nation
- State
- County
Each county has exactly one parent state. Example:
- Latah County, ID → code
16057
→ parent State: Idaho → code16
Similar hierarchies exist for:
- Industries
- Occupations
- Classes of worker
- Gender, race, ethnicity, and other demographic facets
The contents of a dimension may vary between dataset versions. Taxonomies are updated regularly to reflect changes in base data sources.
Data Discovery
Agnitio provides metadata endpoints that help you understand what datasets are available, their versions, and how they are structured. These endpoints allow you to explore datasets before making queries. The datasets you see depend on your Lightcast contract.
1. List Available Datasets
Use the /meta
endpoint to return all datasets you can access and their available versions.
curl --request GET \
--url https://agnitio.emsicloud.com/meta \
--header 'Authorization: bearer <access_token>'
Response:
{
"datasets": [
{
"name": "emsi.us.industry",
"versions": [
"2023.4",
"2024.1",
"2024.2",
"2024.3"
]
},
{
"name": "emsi.us.unemployment.age",
"versions": [
"2023.4",
"2024.1",
"2024.2",
"2024.3"
]
}
]
}
2. Dataset Definitions
Add/definitions
to the path to return dataset metadata, including descriptions and use cases.
curl --request GET \
--url https://agnitio.emsicloud.com/meta/definitions \
--header 'Authorization: bearer <access_token>'
Response:
{
"datasets": [
{
"name": "EMSI.us.MinimumWage",
"versions": [
"2023.4",
"2024.1",
"2024.2",
"2024.3"
],
"description": "# Description\n\nThis dataset shows minimum wage by state by year, from 2001 to the most recent year available. Data is available for all 50 states plus the District of Columbia, as well as the US minimum wage. In cases where cities within the state have different minimum wage laws than the state itself, the state's minimum wage is shown. In all cases where the state's minimum wage is less than the US minimum wage, the US minimum wage is shown.\n\nOnly identity mappings are allowed along StateID.\n\n# Use Cases\n\n#### Questions answered by this dataset:\n\n* What was the minimum wage in Oregon in 2011?\n* What is the minimum wage in the state of Washington?\n* What state has the highest minimum wage?\n\n# Metrics\n\n* MinimumWage: Shows the minimum wage for the state and year selected; if the state minimum wage is less than the nation, then the nation is shown.\n\n# Filters\n\n* Area (Nation, State)\n* Year\n",
"title": "US Minimum Wage"
}
]
}
3. Dataset Versions
Use dataset/<name>
to return all available versions of a specific dataset.:
curl --request GET \
--url https://agnitio.emsicloud.com/meta/dataset/emsi.us.industry \
--header 'Authorization: bearer <access_token>'
Response:
[
"2023.4",
"2024.1",
"2024.2",
"2024.3"
]
4. Dataset Version Metadata
Use dataset/<name>/<version>
to return details about a dataset version, including attributes, metrics, and dimensions.
curl --request GET \
--url https://agnitio.emsicloud.com/meta/dataset/emsi.us.industry/2024.2 \
--header 'Authorization: bearer <access_token>'
Response:
{
"datasetName": "EMSI.US.Industry",
"versionName": "2024.2",
"attributes": {
"estabStartYear": "2004",
"minYearInclusive": "2001",
"estabYear": "2023",
"name": "Industry",
"path": "Industry",
"type": "dataset",
"currentJobsQuarters": "2022Q4,2023Q1,2023Q2,2023Q3",
"maxYearInclusive": "2034",
"description": "# Description\n\nEmsi's industry dataset contains information on industries back to 2001. Job counts are projected 10 years beyond the current calendar year; earnings are not projected. Available areas are nation, states, MSAs, counties, ZIPs, and tracts. Establishment data is not available for ZIP or tract level.\n\nWhen requesting MSA codes or ZIP codes as the area constraint in a query, the code must be prepended by MSA or ZIP, respectively (e.g. for ZIP code 98102, use 'ZIP98102')\n\n# Use Cases\n\n#### Questions answered by this dataset:\n\n* How many manufacturing jobs are projected to exist over the next 10 years in Calvert County, MD?\n* How many Insurance Carriers establishments are there in Nebraska?\n* What is the average wage for workers in hospitals in Ann Arbor, Michigan?\n* How much higher is the average wage today for workers in hospitals than it was in 2005?\n* How many health care jobs exist in the ZIP codes that make up downtown Seattle?\n* How do earnings for Registered Nurses differ in the various parts of Los Angeles County?\n\n# Metrics\n\n* Jobs: The number of occupied positions. This is not quite the same as workers because one worker might fill more than one position.\n* Wages: Total base earnings for the industry. Note that this figure is for the whole industry, not for the average worker in the industry.\n* Supplements: Total supplements for the industry. This figure is also for the whole industry, not for the average worker in the industry.\n* Establishments: The number of physical business locations.\n* Earnings: Total earnings for the industry. Sum of wages and supplements. This figure is also for the whole industry, not for the average worker in the industry.\n* EPW: Earnings Per Worker. The earnings figure (sum of wages and salaries) divided by the number of jobs in the industry.\n* SPW: Supplements Per Worker. Supplements for the industry divided by the number of jobs in the industry.\n* WPW: Wages Per Worker. Wages for the industry divided by the number of jobs in the industry.\n\n# Filters\n\n* Class of Worker\n* Area (Nation, State, MSA, County, ZIP, tract)\n* Industry (2 to 6-digit NAICS)\n* Year\n",
"earnYear": "2023",
"currentYear": "2023",
"countryCode": "us",
"datarun": "2024.2",
"displayName": "US Industry (Nation, State, MSA, County, ZIP, Census Tract)",
"releaseDate": "2024-04-12 07:25:42.424191Z",
"numAggPaths_AreaID": "4",
"levelsStored_AreaID": "[1:[1,2,3,4],2:[2,3,4],3:[1,2,3,4],4:[2,3,4]]"
},
"dimensions": [
{
"name": "ClassOfWorker",
"levelsStored": ["2"]
},
{
"name": "Area",
"levelsStored": ["1", "2", "3", "4"]
},
{
"name": "Industry",
"levelsStored": ["1", "2", "3", "4", "5", "6"]
}
],
"metrics": [
{ "name": "Jobs.2001" },
{ "name": "Jobs.2002" },
{ "name": "Jobs.2034" },
{ "name": "Wages.2001" }
]
}
5. Dimension Hierarchies
Use dataset/<name>/<version>/<dimension>
to return the full hierarchy of a dataset dimension.
curl --request GET \
--url https://agnitio.emsicloud.com/meta/dataset/emsi.us.industry/2024.2/Area \
--header 'Authorization: bearer <access_token>'
Response:
{
"name": "Area",
"hierarchy": [
{
"name": "United States",
"abbr": "US",
"child": "0",
"display_id_parent": "0",
"parent": "0",
"aggregation_path": "1",
"level_name": "1"
},
{
"name": "United States",
"abbr": "US",
"child": "0",
"display_id_parent": "0",
"parent": "0",
"aggregation_path": "3",
"level_name": "1"
},
{
"name": "Alabama",
"abbr": "AL",
"child": "1",
"display_id_parent": "0",
"parent": "0",
"aggregation_path": "1",
"level_name": "2"
},
{
"name": "Alabama",
"abbr": "AL",
"child": "1",
"display_id_parent": "0",
"parent": "0",
"aggregation_path": "3",
"level_name": "2"
},
{
"name": "Delaware",
"abbr": "DE",
"child": "10",
"display_id_parent": "0",
"parent": "0",
"aggregation_path": "1",
"level_name": "2"
}
]
}
6. Filtering Dimension Results
You can filter dimension results in two ways:
By Aggregation Path and Level:
Use query parameters like aggpath
and level
to refine results by hierarchy.
curl --request GET \
--url 'https://agnitio.emsicloud.com/meta/dataset/emsi.us.industry/2024.2/Area?aggpath=3&level=4' \
--header 'Authorization: bearer <access_token>'
By Identifier or Name:
Search within a dimension using identifiers such as FIPS codes or location names.
curl --request GET \
--url https://agnitio.emsicloud.com/meta/dataset/emsi.us.industry/2024.2/Area/search/Latah \
--header 'Authorization: bearer <access_token>'
Data Queries
1. Basics
Agnitio data queries are performed by constructing a JSON object that describes the query and sending it via POST
to the dataset endpoint.
Here’s an example querying the US Industry dataset to get the number of jobs and establishments in Latah County, ID for the full-service restaurant industry:
curl --request POST \
--url https://agnitio.emsicloud.com/emsi.us.industry/2024.2 \
--header 'Authorization: bearer <access_token>' \
--header 'Content-Type: application/json' \
--data '{
"metrics": [
{ "name": "Jobs.2023", "as": "2023 Jobs" },
{ "name": "Establishments.2023" }
],
"constraints": [
{ "dimensionName": "Area", "map": { "Latah County, ID": ["16057"] } },
{ "dimensionName": "Industry", "map": { "Full Service Restaurants": ["722511"] } }
]
}'
Response:
{
"data": [
{
"name": "Area",
"type": "String",
"rows": ["Latah County, ID"]
},
{
"name": "Industry",
"type": "String",
"rows": ["Full Service Restaurants"]
},
{
"name": "2023 Jobs",
"type": "Double",
"rows": [507.7011804944611]
},
{
"name": "Establishments.2023",
"type": "Double",
"rows": [30.25]
}
],
"errors": [],
"timings": [
"Build sortable results: 0.15ms",
"Sort results: 0.16ms",
"Build final response: 0.02ms",
"Parsed constraints: 2.74ms",
"Parsed metrics: 0.09ms",
"Parse query: 3.03ms",
"Generate query cartesian product: 0.07ms",
"Generate parallelization plan: 0.03ms",
"Execute parallel query at: 2.27ms",
"Overall query time: 12.35ms"
],
"totalRows": 1
}
Query structure explained:
- The JSON object contains two required fields:
metrics
andconstraints
. metrics
is an array of objects specifying which measures you want. Each object has a requiredname
field and an optionalas
field to rename the metric in the response.constraints
is an array of objects specifying how the dimensions should be filtered or aggregated. In the example, the Area dimension is limited to a single FIPS code,"16057"
, representing"Latah County, ID"
.- Responses are column-oriented, so all arrays under
rows
have the same length, even if the query returns multiple rows.
Understanding the map
field:
map
field:The map
field defines how user-defined names correspond to dimension codes. For example:
{
"dimensionName": "Area",
"map": {
"Seattle Area (10 mile radius)": ["53033", "53035"],
"Seattle Area (20 mile radius)": ["53033", "53035", "53061", "53053"]
}
}
- Each key in the map becomes a value in the response.
- Multiple codes can be combined to form a single user-defined area.
For datasets supporting ZIP and MSA areas, prepend the type to each code (e.g.,
ZIP98101
). FIPS codes do not require a prefix.
Shorthand: mapLevel
:
mapLevel
:Instead of mapping each child individually, you can use mapLevel
to expand a parent to all its children automatically:
{
"dimensionName": "Area",
"mapLevel": {
"level": 3,
"predicate":[16]
}
}
- This expands all children of the specified parent (here, state code 16) to individual entries.
- The expanded mapping is equivalent to manually listing all FIPS codes, which saves time on large queries:
{
"dimensionName": "Area",
"map": {
"16003": ["16003"],
"16005": ["16005"],
...
"16057": ["16057"],
...
"16999": ["16999"]
}
}
2. Sorting, Offsets, and Limits
Agnitio allows you to control the order of results using the sortBy
field. You can sort by one or more metrics or dimensions in ascending or descending order.
{
"metrics": [{"name": "Jobs.2020"}],
"constraints": [
{
"dimensionName": "Area",
"map": {"Seattle Area (10 mile radius)": ["53033", "53035"]}
}
],
"sortBy": [
{"name": "Area", "direction": "ascending"},
{"name": "Jobs.2020", "direction": "descending"}
]
}
By default, all requested data is returned, but you can use offset
and limit
for pagination:
{
"metrics": [ ... ],
"constraints": [ ... ],
"offset": 100,
"limit": 50
}
offset
skips the first N rows of the results.limit
restricts the number of rows returned.These fields are useful for large datasets or when implementing paging in applications.
3. Location Quotient
Location Quotient (LQ) measures geographic concentration of metrics like jobs or establishments. Agnitio can calculate LQ automatically for datasets with an Area
dimension:
{
"metrics": [
{
"name": "Jobs.2020",
"as": "Jobs 2020 LQ",
"operation": {
"name": "LocationQuotient",
"geoparent": "0",
"along": "Industry"
}
}
],
"constraints": [
{
"dimensionName": "Area",
"map": {"Latah County, ID": ["16057"]}
},
{
"dimensionName": "Industry",
"map": {"Full Service Restaurants": ["722511"]}
}
]
}
Operation fields explained:
operation.name
– identifies this as a Location Quotient calculation.operation.geoparent
– the parent area code for comparison. Use0
for national, or the containing state for state-level LQ.operation.along
– the dimension along which to compute concentration, usuallyIndustry
orOccupation
.
4. Shift Share
Shift Share analyzes changes in a metric over time and attributes them to different factors. Agnitio can calculate Shift Share for datasets with an Area
dimension and metrics with a time component:
{
"metrics": [
{
"name":"Jobs.2020",
"operation":{
"name":"ShiftShare",
"geoparent":"0",
"along":"Industry",
"base": "Jobs.2010"
}
}
],
"constraints": [ ... ]
}
Operation fields explained:
operation.name
– identifies this as a Shift Share calculation.operation.geoparent
– the parent area code for comparison. Use 0 for national, or the containing state for state-level analysis.operation.along
– the dimension to perform the comparison along, usuallyIndustry
orOccupation
.operation.base
– the starting metric for comparison (here, analyzing change from 2010 to 2020).
Response format:
- Shift Share metrics return four columns per metric:
Job Change
Parent Growth Effect
Mix Effect
Competitive Effect
- Column names are prefixed with the
as
field from the metric if supplied.