Extracts text from a file and can be called with a POST Request to the following URL:

[POST]  https://azure.leadtools.com/api/Recognition/ExtractText

Common Service Request URL Parameters

The following parameters are required unless indicated otherwise, and are used by all Conversion and Recognition API calls:

Parameter Description Accepted Values
fileUrl (Optional) The URL to the file to be processed. For more information, refer to the Cloud Services Overview section. A string or URI containing a valid URL to the file to be uploaded.
firstPage The first page in the file to process. An integer value between 1 and the total number of pages in the file.
lastPage The last page in the file to process Passing a value of -1 or 0 will indicate to the service that all pages between the First Page parameter, and the last page in the file will be processed. Otherwise, an integer value between 1 and the total number of pages in the file must be passed, and the value must be greater than or equal to the value specified in the FirstPage parameter.
guid (Optional) Unique identifier corresponding to an uploaded file. This value will be returned when a file is uploaded using the UploadFile service call. A valid GUID
filePassword (Optional) The password to unlock a password protected file. A string.
callbackUrl (Optional) Passing a callbackURL to the service will allow us to notify you when your file has finished processing. If the callbackUrl is invalid or malicious, it will be ignored. The LEADTOOLS Cloud Services will send the request’s ID in the body of the message sent to the callbackUrl. A string or URI containing a valid URL to message.
ocrLanguage (Optional) The OCR Language to use when OCRing a Raster file. Defaults to en (English) if no languages are specified. 0 - en
1 - af
2 - sq
3 - az
4 - eu
5 - be
6 - bg
7 - ca
8 - hr
9 - cs
10 - da
11 - nl
12 - et
13 - fi
14 - fr
15 - gl
16 - de
17 - el
18 - hu
19 - is
20 - id
21 - it
22 - ko
23 - lv
24 - lt
25 - mk
26 - ms
27 - mt
28 - no
29 - pl
30 - pt
31 - ro
32 - ru
33 - sr
34 - sk
35 - sl
36 - es
37 - sw
38 - sv
39 - te
40 - th
41 - tr
42 - uk
43 - vi

Request Specific Parameters

Additional parameters available are listed below.

Parameter Description Accepted Values
characterData (Optional) Value indicating whether you want to receive additional data regarding the Characters found in each page and their locations. A Boolean

Status Codes

The following status codes will be returned when the method is called:

Status Description
200 The ExtractText request has been successfully received.
400 The request was not valid for one of the following reasons:

* Required request parameters were not included.
* GUID value was not provided.
* File information provided was malformed.
* Attempting to queue a request on a file that has not yet been verified.
401 The AppID/Password combination is not valid, or does not correspond with the GUID provided.
402 There are not enough pages left in the Application to process the request.
500 There was an internal error processing your request.

Returns

If performing a single-service call to ExtractText, a unique-identifier will be returned that can be used to query the progress of the extraction.

Online Demo

This method is available for free in our live online demo. You do not need an account and you can test out your own files to see the results.

Try Now!

Examples


//Simple script to make and process the results of an ExtractText request to the LEADTOOLS CloudServices.

const request = require('request');

var servicesUrl = "https://azure.leadtools.com/api/";

//The first page in the file to mark for processing
var firstPage = 1;

//Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed.
var lastPage = -1;

//We will be uploading the file via a URl.  Files can also be passed by adding a PostFile to the request.  Only 1 file will be accepted per request.
//The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content
var fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf';

var recognitionUrl = servicesUrl + 'Recognition/ExtractText?firstPage=' + firstPage + '&lastPage=' + lastPage + '&fileurl=' + fileURL;

request.post(getRequestOptions(recognitionUrl), recognitionCallback);


function recognitionCallback(error, response, body){
    if(!error && response.statusCode == 200){
        var guid = body;
        console.log("Unique ID returned by the Services: " + guid);
    }
}

function getRequestOptions(url){
    //Function to generate and return HTTP request  options.
    var requestOptions ={
        url: url,
        headers: {
            'Content-Length' : 0
        },
        auth: {
            user:"Enter Application ID",
            password:"Enter Application Password"
        }
    };
    return requestOptions;
}


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using Newtonsoft.Json.Linq;

namespace Azure_Code_Snippets.DocumentationSnippets
{
   class CloudServices_ExtractText_Demo
   {
      private string hostedServicesUrl = "https://azure.leadtools.com/api/";
      public async void ExtractTextAdditional()
      {
         //The first page in the file to mark for processing
         int firstPage = 1;

         //Sending a value of -1 will indicate to the service that all pages in the file should be processed.
         int lastPage = -1;

         string fileURL = "https://demo.leadtools.com/images/pdf/leadtools.pdf";

         string recognitionUrl = string.Format("Recognition/ExtractTextAdditional?firstPage={0}&lastPage={1}&fileurl={2}", firstPage, lastPage, fileURL);

         var client = InitClient();
         var result = await client.PostAsync(recognitionUrl, null);
         if (result.StatusCode == HttpStatusCode.OK)
         {
            //Unique ID returned by the services
            string id = await result.Content.ReadAsStringAsync();
            Console.WriteLine("Unique ID returned by the services: " + id);
         }
         else
            Console.WriteLine("Request failed with the following response: " + result.StatusCode);

      }

      private HttpClient InitClient()
      {
         string AppId = "Enter Application ID";
         string Password = "Enter Application Password";

         HttpClient client = new HttpClient();
         client.BaseAddress = new Uri(hostedServicesUrl);
         client.DefaultRequestHeaders.Accept.Clear();
         client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));

         string authData = string.Format("{0}:{1}", AppId, Password);
         string authHeaderValue = Convert.ToBase64String(Encoding.UTF8.GetBytes(authData));
         client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic", authHeaderValue);

         return client;
      }
   }
}


#Simple script to make an ExtractText request to the LEADTOOLS CloudServices, and parse the resulting JSON.

import requests, sys, time

servicesUrl = 'https://azure.leadtools.com/api/'

baseRecognitionUrl ='{}Recognition/ExtractText?firstPage={}&lastPage={}&fileurl={}'

#The first page in the file to mark for processing
firstPage = 1

#Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed.
lastPage = -1


#We will be uploading the file via a URl.  Files can also be passed by adding a PostFile to the request.  Only 1 file will be accepted per request.
#The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content
fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf'

formattedRecognitionUrl = baseRecognitionUrl.format(servicesUrl,firstPage, lastPage, fileURL)

#The application ID.
appId = "Enter Application ID";

#The application password.
password = "Enter Application Password";

request = requests.post(formattedRecognitionUrl, auth=(appId, password))
if request.status_code != 200:
    print("Error sending the conversion request \n")
    print(request.text)
    sys.exit()

#Grab the GUID from the Request
guid = request.text
print("Unique ID returned by the services: " + guid + "\n")


<?php
    //Simple script to make an ExtractText request to the LEADTOOLS CloudServices, and parse the resulting JSON.

    $servicesBaseUrl = "https://azure.leadtools.com/api/";

    $baseRecognitionURL = '%sRecognition/ExtractText?firstPage=%s&lastPage=%s&fileurl=%s';

    //The first page in the file to mark for processing
    $firstPage = 1;

    //Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed.
    $lastPage = -1;

    //We will be uploading the file via a URl.  Files can also be passed by adding a PostFile to the request.  Only 1 file will be accepted per request.
    //The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content
    $fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf';

    $formattedConversionURL = sprintf($baseRecognitionURL, $servicesBaseUrl, $firstPage, $lastPage, $fileURL);

    $conversionRequestOptions = GeneratePostOptions($formattedConversionURL);

    $request = curl_init();
    curl_setopt_array($request, $conversionRequestOptions); //Set the request URL

    if(!$guid = curl_exec($request))
    {
        echo "There was an error processing the request. \n\r";
        echo $guid;
        exit;
    }
    curl_close($request); //Close the request

    echo "Unique ID returned by the services: $guid \n\r";

    function GeneratePostOptions($url)
    {
        $appId = "Enter Application ID";
        $password = "Enter Application Password";
        $headers = array(
            "Content-Length : 0"
            );
        $postOptions = array(
            CURLOPT_POST => 1,
            CURLOPT_URL => $url,
            CURLOPT_FRESH_CONNECT => 1,
            CURLOPT_RETURNTRANSFER => 1,
            CURLOPT_USERPWD => "$appId:$password",
            CURLOPT_FORBID_REUSE => 1,
            CURLOPT_HTTPHEADER => $headers
        );
        return $postOptions;
    }
?>


#Simple script to make and process the results of an ExtractText request to the LEADTOOLS CloudServices.

use base 'HTTP::Message';
use LWP::UserAgent ();

require HTTP::Request;
require HTTP::Headers;

my $servicesUrl = "https://azure.leadtools.com/api/";

#The first page in the file to mark for processing
my $firstPage = 1;

#Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed.
my $lastPage = -1;

#We will be uploading the file via a URl.  Files can also be passed by adding a PostFile to the request.  Only 1 file will be accepted per request.
#The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content
my $fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf';

my $appId = 'Enter Application ID';
my $password = 'Enter Application Password';
my $headers = HTTP::Headers->new(
    Content_Length => 0
);
$headers->authorization_basic($appId, $password);


#The User Agent to be used when making requests
my $ua = LWP::UserAgent->new;

#For the purposes of this script, we will be extracting info from a barcode.
my $recognitionUrl = $servicesUrl . 'Recognition/ExtractText?firstPage=' . $firstPage . '&lastPage=' . $lastPage . '&fileurl=' . $fileURL;

my $request = HTTP::Request->new(POST => $recognitionUrl, $headers);
my $response = $ua->request($request);
if(!$response->is_success){
    print STDERR $response->status_line, "\n";
    exit;
}

my $guid = $response->decoded_content;
print("Unique ID returned by the services: " . $guid . "\n");