Extracts text from a file and can be called with a POST Request to the following URL:
[POST] https://azure.leadtools.com/api/Recognition/ExtractText
Common Service Request URL Parameters
The following parameters are required unless indicated otherwise, and are used by all Conversion and Recognition API calls:
Parameter | Description | Accepted Values |
---|---|---|
fileUrl (Optional) |
The URL to the file to be processed. For more information, refer to the Cloud Services Overview section. | A string or URI containing a valid URL to the file to be uploaded. |
firstPage |
The first page in the file to process. | An integer value between 1 and the total number of pages in the file. |
lastPage |
The last page in the file to process. | Passing a value of -1 or 0 will indicate to the service that all pages between the First Page parameter, and the last page in the file will be processed. Otherwise, an integer value between 1 and the total number of pages in the file must be passed, and the value must be greater than or equal to the value specified in the FirstPage parameter. |
guid (Optional) |
Unique identifier corresponding to an uploaded file. This value will be returned when a file is uploaded using the UploadFile service call. | A valid GUID |
filePassword (Optional) |
The password to unlock a password protected file. | A string containing the password for a secure PDF. |
callbackUrl (Optional) |
Passing a callbackURL to the service will allow us to notify you when your file has finished processing. If the callbackUrl is invalid or malicious, it will be ignored. The LEADTOOLS Cloud Services will send the request’s ID in the body of the message sent to the callbackUrl. | A string or URI containing a valid URL to message. |
ocrLanguage (Optional) |
The OCR Language to use when OCRing a Raster file. Defaults to en (English) if no languages are specified. | 0 - en 1 - bg 2 - hr 3 - cs 4 - da 5 - nl 6 - fr 7 - de 8 - el 9 - hu 10 - it 11 - pl 12 - pt 13 - sr 14 - es 15 - sv 16 - tr 17 - uk |
Request Specific Parameters
Additional parameters available are listed below.
Parameter | Description | Accepted Values |
---|---|---|
characterinfo (Optional) |
Value indicating whether you want to receive additional data regarding the Characters found in each page and their locations. | A Boolean |
Status Codes
The following status codes will be returned when the method is called:
Status | Description |
---|---|
200 | The request has been successfully received. |
400 | The request was not valid for one of the following reasons: Required request parameters were not included. GUID value was not provided. File information provided was malformed. Attempting to queue a request on a file that has not yet been verified. |
401 | The AppID/Password combination is not valid or does not correspond with the GUID provided. |
402 | There are not enough pages left in the Application to process the request. |
500 | There was an internal error processing your request. |
Returns
If performing a single-service call, a unique-identifier will be returned that can be used to query the progress of the extraction.
Online Demo
This method is available for free in our live online demo. You do not need an account and you can test out your own files to see the results.
Examples
//Simple script to make and process the results of an ExtractText request to the LEADTOOLS CloudServices.
const request = require('request');
var servicesUrl = "https://azure.leadtools.com/api/";
//The first page in the file to mark for processing
var firstPage = 1;
//Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed.
var lastPage = -1;
//We will be uploading the file via a URL. Files can also be passed by adding a PostFile to the request. Only 1 file will be accepted per request.
//The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content
var fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf';
var recognitionUrl = servicesUrl + 'Recognition/ExtractText?firstPage=' + firstPage + '&lastPage=' + lastPage + '&fileurl=' + fileURL;
request.post(getRequestOptions(recognitionUrl), recognitionCallback);
function recognitionCallback(error, response, body){
if(!error && response.statusCode == 200){
var guid = body;
console.log("Unique ID returned by the Services: " + guid);
}
}
function getRequestOptions(url){
//Function to generate and return HTTP request options.
var requestOptions ={
url: url,
headers: {
'Content-Length' : 0
},
auth: {
user:"Enter Application ID",
password:"Enter Application Password"
}
};
return requestOptions;
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using Newtonsoft.Json.Linq;
namespace Azure_Code_Snippets.DocumentationSnippets
{
class CloudServices_ExtractText_Demo
{
private string hostedServicesUrl = "https://azure.leadtools.com/api/";
public async void ExtractTextAdditional()
{
//The first page in the file to mark for processing
int firstPage = 1;
//Sending a value of -1 will indicate to the service that all pages in the file should be processed.
int lastPage = -1;
string fileURL = "https://demo.leadtools.com/images/pdf/leadtools.pdf";
string recognitionUrl = string.Format("Recognition/ExtractTextAdditional?firstPage={0}&lastPage={1}&fileurl={2}", firstPage, lastPage, fileURL);
var client = InitClient();
var result = await client.PostAsync(recognitionUrl, null);
if (result.StatusCode == HttpStatusCode.OK)
{
//Unique ID returned by the services
string id = await result.Content.ReadAsStringAsync();
Console.WriteLine("Unique ID returned by the services: " + id);
}
else
Console.WriteLine("Request failed with the following response: " + result.StatusCode);
}
private HttpClient InitClient()
{
string AppId = "Enter Application ID";
string Password = "Enter Application Password";
HttpClient client = new HttpClient();
client.BaseAddress = new Uri(hostedServicesUrl);
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json"));
string authData = string.Format("{0}:{1}", AppId, Password);
string authHeaderValue = Convert.ToBase64String(Encoding.UTF8.GetBytes(authData));
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic", authHeaderValue);
return client;
}
}
}
#Simple script to make an ExtractText request to the LEADTOOLS CloudServices and parse the resulting JSON.
import requests, sys, time
servicesUrl = 'https://azure.leadtools.com/api/'
baseRecognitionUrl ='{}Recognition/ExtractText?firstPage={}&lastPage={}&fileurl={}'
#The first page in the file to mark for processing
firstPage = 1
#Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed.
lastPage = -1
#We will be uploading the file via a URL. Files can also be passed by adding a PostFile to the request. Only 1 file will be accepted per request.
#The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content
fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf'
formattedRecognitionUrl = baseRecognitionUrl.format(servicesUrl,firstPage, lastPage, fileURL)
#The application ID.
appId = "Enter Application ID";
#The application password.
password = "Enter Application Password";
request = requests.post(formattedRecognitionUrl, auth=(appId, password))
if request.status_code != 200:
print("Error sending the conversion request \n")
print(request.text)
sys.exit()
#Grab the GUID from the Request
guid = request.text
print("Unique ID returned by the services: " + guid + "\n")
<?php
//Simple script to make an ExtractText request to the LEADTOOLS CloudServices and parse the resulting JSON.
$servicesBaseUrl = "https://azure.leadtools.com/api/";
$baseRecognitionURL = '%sRecognition/ExtractText?firstPage=%s&lastPage=%s&fileurl=%s';
//The first page in the file to mark for processing
$firstPage = 1;
//Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed.
$lastPage = -1;
//We will be uploading the file via a URL. Files can also be passed by adding a PostFile to the request. Only 1 file will be accepted per request.
//The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content
$fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf';
$formattedConversionURL = sprintf($baseRecognitionURL, $servicesBaseUrl, $firstPage, $lastPage, $fileURL);
$conversionRequestOptions = GeneratePostOptions($formattedConversionURL);
$request = curl_init();
curl_setopt_array($request, $conversionRequestOptions); //Set the request URL
if(!$guid = curl_exec($request))
{
echo "There was an error processing the request. \n\r";
echo $guid;
exit;
}
curl_close($request); //Close the request
echo "Unique ID returned by the services: $guid \n\r";
function GeneratePostOptions($url)
{
$appId = "Enter Application ID";
$password = "Enter Application Password";
$headers = array(
"Content-Length : 0"
);
$postOptions = array(
CURLOPT_POST => 1,
CURLOPT_URL => $url,
CURLOPT_FRESH_CONNECT => 1,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_USERPWD => "$appId:$password",
CURLOPT_FORBID_REUSE => 1,
CURLOPT_HTTPHEADER => $headers
);
return $postOptions;
}
?>
#Simple script to make and process the results of an ExtractText request to the LEADTOOLS CloudServices.
use base 'HTTP::Message';
use LWP::UserAgent ();
require HTTP::Request;
require HTTP::Headers;
my $servicesUrl = "https://azure.leadtools.com/api/";
#The first page in the file to mark for processing
my $firstPage = 1;
#Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed.
my $lastPage = -1;
#We will be uploading the file via a URL. Files can also be passed by adding a PostFile to the request. Only 1 file will be accepted per request.
#The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content
my $fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf';
my $appId = 'Enter Application ID';
my $password = 'Enter Application Password';
my $headers = HTTP::Headers->new(
Content_Length => 0
);
$headers->authorization_basic($appId, $password);
#The User Agent to be used when making requests
my $ua = LWP::UserAgent->new;
#For the purposes of this script, we will be extracting info from a barcode.
my $recognitionUrl = $servicesUrl . 'Recognition/ExtractText?firstPage=' . $firstPage . '&lastPage=' . $lastPage . '&fileurl=' . $fileURL;
my $request = HTTP::Request->new(POST => $recognitionUrl, $headers);
my $response = $ua->request($request);
if(!$response->is_success){
print STDERR $response->status_line, "\n";
exit;
}
my $guid = $response->decoded_content;
print("Unique ID returned by the services: " . $guid . "\n");