Fetch OCR Data

What is OCR data?

The OCR data for an onboarding contains all of the information that was extracted from the ID document during the onboarding. This includes both the textual information and any available barcode or MRZ data that was read and decoded from the back of the ID.

📘
Keep in mind
Since the data in different ID documents varies, this endpoint's response is dynamic. It will only contain the information relevant to the detected ID. Consider this specifically if using compiled languages (like Java or C#) to consume these endpoints, as your JSON parsers could break if a property you expect is missing.

How is OCR data fetched?

To fetch OCR data for a given onboarding session, you will need to pass the session's unique interview ID to the fetch ocr data API endpoint. You can test the endpoint yourself by visiting our API documentation. If you do not pass the interview ID, we will attempt to extract the interview ID from the session token. See below for a description of all data that is returned in the API response.

Common use cases

Extract information from the ID document attached to session

Here's a non-exhaustive list of the most commonly used OCR fields from the ID document:

{
    "name": {
        "fullName": "",
        "firstName": "",
        "paternalLastName": "",
        "maternalLastName": "", // Optional
        "givenName": "",
        "middleName": "", // Optional
        "nameSuffix": "", // Optional
        "machineReadableFullName": "", // Optional, full name from Barcode or MRZ
        "givenNameMrz": "", // Optional
        "lastNameMrz": "" // Optional
    },
    "address": "", // Optional, address as read from ID
    "addressFields": {
        "street": "", // Optional
        "colony": "", // Optional, not applicable for all countries
        "postalCode": "", // Optional
        "city": "", // Optional
        "state": "" // Optional
    },
    "typeOfId": "", // Id classification, ie: Drivers License, Voter Identification, etc.
    "issuedAt": 0, // Issue date, expressed as a epoch timestamp in milliseconds
    "expireAt": 0, // Expiration date, expressed as a epoch timestamp in milliseconds
    "issuingCountry": "", // Optional
    "documentNumber": "", // Optional
    "fullAddress": true, //Optional. True if address from id is full (has three lines)
    "cic": "", // Mexican INE only
    "ocr": "", // Mexican INE only
}

Take a look at the list at the bottom of this page for a more detailed description of these and other available fields.

Extract information from the POA document attached to session

When you require to extract the address or information from a POA document added to a session, you can do so by accessing the following fields in the JSON response from the API:

{
    "documentType": "", // Classifier of the provided POA document
    "poaName": "", // The name that appears in the provided POA document
    "addressStatementEmissionDate": "", // Expiration date, expressed as a epoch timestamp in milliseconds
    "addressFromStatement": "", // Full address read from statement
    "addressFieldsFromStatement": {
        "street": "",
        "colony": "",
        "postalCode": "",
        "city": "",
        "state": ""
    },
}

These fields should be enough to answer the following questions:

What's the address in the POA document?
What name appears in the POA document?
When was the POA document issued?
What kind of document was used as POA?

Other OCR-data extraction endpoints

Above samples come from the standard ocr data endpoint. There are 3 alternative endpoints available for OCR-data extraction (although less likely to be needed):

OCR data v2 wraps the response from the ocr-data endpoint under an ocrData field. Same data as ocr data endpoint.
Second Id's OCR data is required if your flow is configured to have two ID documents attached to the same session. This will provide you with ocr data from the second Id.
Batch fetch OCR data when you want to fetch ocr data for multiple onboarding sessions.

📘
Scores and OCR-data
It's important to notice that while the OCR data shows in the dashboard with some scoring, this scoring is related to the level of confidence on the data extracted given the captured image of the ID.
This level of confidence of the OCR data does not directly affect or alter the score of a session