Image
OCR

OCR with Vehicle Title

Extract information from a vehicle title into a clean, structured format.

+ Copy this ability

eyepop.structured-OCR.read-vehicle-title:1.0.0

Prompt

You are given an image of a vehicle title document (certificate of title).

        Your task is to extract structured data that is clearly visible in the image.

        Return ONLY valid JSON.

        Do not include explanation.

        Do not include markdown.

        Do not include...

...Run the full prompt in your EyePop.ai dashboard

Get this prompt

Input

Image

Output

JSON

Image size

512x512 - Small

Model type

EyePop.ai VLM

How It Works

Verifying vehicle ownership is an important ability for many businesses however manually reviewing and typing out information from a vehicle title document is slow and prone to human error. Thus, being able to accurately and automatically extract this information is vital for a smooth user experience and for productivity in business. The Structured OCR task on the Abilities tab can act as a powerful Optical Character Recognition (OCR) tool, reading the text on a document and outputting it into a clean, structured format.

For example, if a user uploads a photo of a standard California Vehicle Title, the model should examine the image and categorize the data into specific fields. The model will pull "ALEX RIVERS" for the Registered Owner, "FORD" and "MUSTANG" for the Make and Model, "2022" for the Vehicle Year, and isolate the specific Vehicle Identification Number (VIN).

In contrast, if a user uploads an image that is severely blurred, cut off, or covered by a harsh glare, the model should ideally flag that the necessary text cannot be confidently extracted, prompting the user to retake the photo.

Our expected inputs are images of identification documents, and the expected output will be a structured text format, a JSON file for this example, containing the extracted text from the target fields.

SDK Tutorial


First, let’s define the ability. Get early access to Abilities here >

ability_prototypes = [
   VlmAbilityCreate(
       name=f"{NAMESPACE_PREFIX}.structured-OCR.read-vehicle-title",
       description="Extract relevant information from drivers license",
       worker_release="qwen3-instruct",
       text_prompt=license_prompt,
       transform_into=TransformInto(),
       config=InferRuntimeConfig(
           max_new_tokens=700,
           image_size=512
       ),
       is_public=False
   )
]

The prompt we can use here is:
"You are given an image of a vehicle title document (certificate of title).

        Your task is to extract structured data that is clearly visible in the image.

         Return ONLY valid JSON.

         Do not include explanation.

         Do not include markdown.

         Do not include..." Get early access to Abilities here >

Next, we can actually create the ability with the following code:

with EyePopSdk.dataEndpoint(api_key=EYEPOP_API_KEY, account_id=EYEPOP_ACCOUNT_ID) as endpoint:
   for ability_prototype in ability_prototypes:
       ability_group = endpoint.create_vlm_ability_group(VlmAbilityGroupCreate(
           name=ability_prototype.name,
           description=ability_prototype.description,
           default_alias_name=ability_prototype.name,
       ))
       ability = endpoint.create_vlm_ability(
           create=ability_prototype,
           vlm_ability_group_uuid=ability_group.uuid,
       )
       ability = endpoint.publish_vlm_ability(
           vlm_ability_uuid=ability.uuid,
           alias_name=ability_prototype.name,
       )
       ability = endpoint.add_vlm_ability_alias(
           vlm_ability_uuid=ability.uuid,
           alias_name=ability_prototype.name,
           tag_name="latest"
       )
       print(f"created ability {ability.uuid} with alias entries {ability.alias_entries}")

That’s it! To run the prompt against an image here is some sample evaluation code:

from pathlib import Path


pop = Pop(components=[
   InferenceComponent(
       ability=f"{NAMESPACE_PREFIX}.structured-OCR.read-vehicle-title:latest"
   )
])


with EyePopSdk.workerEndpoint(api_key=EYEPOP_API_KEY) as endpoint:
   endpoint.set_pop(pop)
   sample_img_path = Path("/content/sample_img.jpg")
   job = endpoint.upload(sample_img_path)
   while result := job.predict():
     print(json.dumps(result, indent=2))
          
print("Done")

After running the evaluation you can see what the model labelled and compare it to your source of truth. With this, you can improve your prompts and thus improve your accuracy. Get early access to Abilities here >

Get early access

Want to move faster with visual automation? Request early access to Abilities and get notified as new vision capabilities roll out.

View CDN documentation →