Background supergraphic

General Document Layout

General Document Layout is a layout-aware OCR service that detects and classifies each text region in a document by its semantic type — such as headers, paragraphs, tables, figures, and formulas — in addition to performing text recognition.

General Document Layout Object

  • Name
    status
    Type
    string
    Description

    Enum code indicating the status of the reading result.

    1. SUCCESS
    2. NO_FILE
    3. FILE_INVALID_FORMAT
    4. FAILED
  • Name
    reason
    Type
    string
    Description

    A human-readable message providing more details about the reading result.

  • Name
    read
    Type
    object
    Description

    Contains the reading for General Document Layout fields.

    • Name
      elements
      Type
      array of object
      Description

      Array of detected elements in the document. Each element is classified by its layout type.

      • Name
        class_name
        Type
        string
        Description

        The semantic class of the detected element. Possible values:

        1. inline_formula — Mathematical or scientific formula rendered inline
        2. header — Page header or section heading
        3. paragraph_title — Paragraph or section title
        4. text — Regular body text
        5. figure_title — Caption or title of a figure
        6. table — Tabular data structure
      • Name
        polygon
        Type
        array of array of number
        Description

        Coordinates of the element bounding box: [[top-left-x, top-left-y], [top-right-x, top-right-y], [bottom-right-x, bottom-right-y], [bottom-left-x, bottom-left-y]]

      • Name
        confidence
        Type
        number
        Description

        Overall detection confidence score for the element (0 to 1).

      • Name
        value
        Type
        string
        Description

        Concatenated text content of the element.

      • Name
        confidence_text
        Type
        number
        Description

        Confidence score for the text recognition within the element (0 to 1).

      • Name
        lines
        Type
        array of object
        Description

        Array of text lines within the element.

        • Name
          value
          Type
          string
          Description

          Recognized text of the line.

        • Name
          value_original
          Type
          string
          Description

          Original raw OCR text of the line before post-processing.

        • Name
          confidence
          Type
          number
          Description

          Confidence score for this line (0 to 1).

        • Name
          polygon
          Type
          array of array of number
          Description

          Bounding box coordinates of the line.

      • Name
        words
        Type
        array of object
        Description

        Array of individual words in the element (flat list, not grouped by line).

        • Name
          value
          Type
          string
          Description

          Recognized text of the word.

        • Name
          value_original
          Type
          string
          Description

          Original raw OCR text of the word before post-processing.

        • Name
          confidence
          Type
          number
          Description

          Confidence score for this word (0 to 1).

        • Name
          polygon
          Type
          array of array of number
          Description

          Bounding box coordinates of the word.

      • Name
        page_index
        Type
        number
        Description

        The page number where the element is located (zero-based).

      • Name
        tables
        Type
        array of object
        Description

        Present only when class_name is table. Contains the structured table data.

        • Name
          status
          Type
          string
          Description

          Enum code indicating the status of the table reading result.

        • Name
          reason
          Type
          string
          Description

          A human-readable message providing more details about the table reading result.

        • Name
          read
          Type
          object
          Description
          • Name
            table
            Type
            object
            Description

            The structured table data.

            • Name
              row_count
              Type
              number
              Description

              Number of rows in the table.

            • Name
              column_count
              Type
              number
              Description

              Number of columns in the table.

            • Name
              cells
              Type
              array of object
              Description

              Array of individual cells in the table.

              • Name
                row_index
                Type
                number
                Description

                Zero-based row index of the cell.

              • Name
                column_index
                Type
                number
                Description

                Zero-based column index of the cell.

              • Name
                row_span
                Type
                number
                Description

                Number of rows the cell spans.

              • Name
                column_span
                Type
                number
                Description

                Number of columns the cell spans.

              • Name
                is_header
                Type
                boolean
                Description

                Whether the cell is a header cell.

              • Name
                is_projected_row_header
                Type
                boolean
                Description

                Whether the cell is projected as a row header.

              • Name
                value
                Type
                string
                Description

                Text content of the cell.

              • Name
                confidence_text
                Type
                number
                Description

                Confidence score for the text recognition in this cell (0 to 1).

              • Name
                polygon_text
                Type
                array of array of number
                Description

                Bounding box coordinates of the text within the cell.

              • Name
                polygon
                Type
                array of array of number
                Description

                Bounding box coordinates of the cell.

              • Name
                polygon_text_detector
                Type
                array of array of number
                Description

                Bounding box coordinates from the text detector.

              • Name
                list_value_text
                Type
                array of string
                Description

                Array of individual text values within the cell.

              • Name
                confidence_polygon
                Type
                number
                Description

                Confidence score for the cell polygon detection (0 to 1).

        • Name
          id
          Type
          string or null
          Description

          Identifier for the table, if available.


POST/ocr/v1/general-document-layout

Read General Document Layout

Detects a valid document image and returns layout-aware OCR results with classified elements.

Required parameter

  • Name
    image
    Type
    file (.png, .jpg, .jpeg, .tiff, .pdf)
    Description

    The image file for the document.

Sample Request

POST
/ocr/v1/general-document-layout
1
2
3
4
curl -v -L -X POST 'https://api.vision.glair.ai/ocr/v1/general-document-layout' \ -H "Authorization: Basic $(printf "%s" "USERNAME:PASSWORD" | base64)" \ -H 'x-api-key: API_KEY' \ -F 'image=@"/path/to/image/document.jpg"'

Sample Response

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
{ "status": "SUCCESS", "reason": "File Successfully Read", "read": { "elements": [ { "class_name": "header", "polygon": [[72, 28], [552, 28], [552, 60], [72, 60]], "confidence": 0.99, "value": "ANNUAL REPORT 2024", "confidence_text": 0.98, "lines": [ { "value": "ANNUAL REPORT 2024", "value_original": "ANNUAL REPORT 2024", "confidence": 0.98, "polygon": [[72, 28], [552, 28], [552, 60], [72, 60]] } ], "words": [ { "value": "ANNUAL", "value_original": "ANNUAL", "confidence": 0.99, "polygon": [[72, 28], [174, 28], [174, 60], [72, 60]] }, { "value": "REPORT", "value_original": "REPORT", "confidence": 0.98, "polygon": [[184, 28], [296, 28], [296, 60], [184, 60]] }, { "value": "2024", "value_original": "2024", "confidence": 0.99, "polygon": [[306, 28], [552, 28], [552, 60], [306, 60]] } ], "page_index": 0 }, { "class_name": "paragraph_title", "polygon": [[72, 88], [378, 88], [378, 114], [72, 114]], "confidence": 0.97, "value": "Financial Highlights", "confidence_text": 0.97, "lines": [ { "value": "Financial Highlights", "value_original": "Financial Highlights", "confidence": 0.97, "polygon": [[72, 88], [378, 88], [378, 114], [72, 114]] } ], "words": [ { "value": "Financial", "value_original": "Financial", "confidence": 0.97, "polygon": [[72, 88], [208, 88], [208, 114], [72, 114]] }, { "value": "Highlights", "value_original": "Highlights", "confidence": 0.97, "polygon": [[218, 88], [378, 88], [378, 114], [218, 114]] } ], "page_index": 0 }, { "class_name": "text", "polygon": [[72, 130], [552, 130], [552, 190], [72, 190]], "confidence": 0.96, "value": "The company reported a 15% increase in revenue compared to the previous fiscal year, driven by strong performance across all business segments.", "confidence_text": 0.95, "lines": [ { "value": "The company reported a 15% increase in revenue compared to the previous fiscal", "value_original": "The company reported a 15% increase in revenue compared to the previous fiscal", "confidence": 0.96, "polygon": [[72, 130], [552, 130], [552, 154], [72, 154]] }, { "value": "year, driven by strong performance across all business segments.", "value_original": "year, driven by strong performance across all business segments.", "confidence": 0.95, "polygon": [[72, 168], [520, 168], [520, 190], [72, 168]] } ], "words": [ { "value": "The", "value_original": "The", "confidence": 0.99, "polygon": [[72, 130], [102, 130], [102, 154], [72, 154]] }, { "value": "company", "value_original": "company", "confidence": 0.98, "polygon": [[110, 130], [200, 130], [200, 154], [110, 154]] }, { "value": "reported", "value_original": "reported", "confidence": 0.97, "polygon": [[208, 130], [314, 130], [314, 154], [208, 154]] }, { "value": "a", "value_original": "a", "confidence": 0.99, "polygon": [[322, 130], [338, 130], [338, 154], [322, 154]] }, { "value": "15%", "value_original": "15%", "confidence": 0.95, "polygon": [[346, 130], [390, 130], [390, 154], [346, 154]] }, { "value": "increase", "value_original": "increase", "confidence": 0.97, "polygon": [[398, 130], [506, 130], [506, 154], [398, 154]] }, { "value": "in", "value_original": "in", "confidence": 0.99, "polygon": [[514, 130], [534, 130], [534, 154], [514, 154]] }, { "value": "revenue", "value_original": "revenue", "confidence": 0.97, "polygon": [[72, 154], [162, 154], [162, 178], [72, 178]] }, { "value": "compared", "value_original": "compared", "confidence": 0.97, "polygon": [[170, 154], [294, 154], [294, 178], [170, 178]] }, { "value": "to", "value_original": "to", "confidence": 0.99, "polygon": [[302, 154], [326, 154], [326, 178], [302, 178]] }, { "value": "the", "value_original": "the", "confidence": 0.99, "polygon": [[334, 154], [370, 154], [370, 178], [334, 178]] }, { "value": "previous", "value_original": "previous", "confidence": 0.97, "polygon": [[378, 154], [486, 154], [486, 178], [378, 178]] }, { "value": "fiscal", "value_original": "fiscal", "confidence": 0.97, "polygon": [[494, 154], [552, 154], [552, 178], [494, 178]] }, { "value": "year,", "value_original": "year,", "confidence": 0.98, "polygon": [[72, 168], [124, 168], [124, 190], [72, 190]] }, { "value": "driven", "value_original": "driven", "confidence": 0.97, "polygon": [[132, 168], [210, 168], [210, 190], [132, 190]] }, { "value": "by", "value_original": "by", "confidence": 0.99, "polygon": [[218, 168], [242, 168], [242, 190], [218, 190]] }, { "value": "strong", "value_original": "strong", "confidence": 0.97, "polygon": [[250, 168], [326, 168], [326, 190], [250, 190]] }, { "value": "performance", "value_original": "performance", "confidence": 0.96, "polygon": [[334, 168], [476, 168], [476, 190], [334, 190]] }, { "value": "across", "value_original": "across", "confidence": 0.97, "polygon": [[72, 188], [148, 188], [148, 210], [72, 210]] }, { "value": "all", "value_original": "all", "confidence": 0.99, "polygon": [[156, 188], [182, 188], [182, 210], [156, 210]] }, { "value": "business", "value_original": "business", "confidence": 0.97, "polygon": [[190, 188], [292, 188], [292, 210], [190, 210]] }, { "value": "segments.", "value_original": "segments.", "confidence": 0.96, "polygon": [[300, 188], [420, 188], [420, 210], [300, 210]] } ], "page_index": 0 }, { "class_name": "figure_title", "polygon": [[72, 696], [368, 696], [368, 720], [72, 720]], "confidence": 0.97, "value": "Figure 1: Revenue Growth by Quarter", "confidence_text": 0.96, "lines": [ { "value": "Figure 1: Revenue Growth by Quarter", "value_original": "Figure 1: Revenue Growth by Quarter", "confidence": 0.96, "polygon": [[72, 696], [368, 696], [368, 720], [72, 720]] } ], "words": [ { "value": "Figure", "value_original": "Figure", "confidence": 0.98, "polygon": [[72, 696], [144, 696], [144, 720], [72, 720]] }, { "value": "1:", "value_original": "1:", "confidence": 0.97, "polygon": [[152, 696], [180, 696], [180, 720], [152, 720]] }, { "value": "Revenue", "value_original": "Revenue", "confidence": 0.97, "polygon": [[188, 696], [280, 696], [280, 720], [188, 720]] }, { "value": "Growth", "value_original": "Growth", "confidence": 0.97, "polygon": [[288, 696], [360, 696], [360, 720], [288, 720]] }, { "value": "by", "value_original": "by", "confidence": 0.99, "polygon": [[368, 696], [394, 696], [394, 720], [368, 720]] }, { "value": "Quarter", "value_original": "Quarter", "confidence": 0.97, "polygon": [[402, 696], [488, 696], [488, 720], [402, 720]] } ], "page_index": 0 }, { "class_name": "inline_formula", "polygon": [[72, 210], [240, 210], [240, 240], [72, 240]], "confidence": 0.88, "value": "E = mc²", "confidence_text": 0.88, "lines": [ { "value": "E = mc²", "value_original": "E = mc²", "confidence": 0.88, "polygon": [[72, 210], [240, 210], [240, 240], [72, 240]] } ], "words": [ { "value": "E", "value_original": "E", "confidence": 0.99, "polygon": [[72, 210], [88, 210], [88, 240], [72, 240]] }, { "value": "=", "value_original": "=", "confidence": 0.99, "polygon": [[96, 210], [112, 210], [112, 240], [96, 240]] }, { "value": "mc²", "value_original": "mc²", "confidence": 0.88, "polygon": [[120, 210], [240, 210], [240, 240], [120, 240]] } ], "page_index": 0 }, { "class_name": "table", "polygon": [[72, 260], [552, 260], [552, 460], [72, 460]], "confidence": 0.95, "page_index": 0, "tables": [ { "status": "SUCCESS", "reason": "File Successfully Read", "read": { "table": { "row_count": 4, "column_count": 3, "cells": [ { "row_index": 0, "column_index": 0, "row_span": 1, "column_span": 1, "is_header": true, "is_projected_row_header": false, "value": "Quarter", "confidence_text": 0.98, "polygon_text": [[82, 272], [170, 272], [170, 294], [82, 294]], "polygon": [[72, 260], [232, 260], [232, 310], [72, 310]], "polygon_text_detector": [[82, 272], [170, 272], [170, 294], [82, 294]], "list_value_text": ["Quarter"], "confidence_polygon": 0.96 }, { "row_index": 0, "column_index": 1, "row_span": 1, "column_span": 1, "is_header": true, "is_projected_row_header": false, "value": "Revenue", "confidence_text": 0.98, "polygon_text": [[262, 272], [354, 272], [354, 294], [262, 294]], "polygon": [[232, 260], [392, 260], [392, 310], [232, 310]], "polygon_text_detector": [[262, 272], [354, 272], [354, 294], [262, 294]], "list_value_text": ["Revenue"], "confidence_polygon": 0.96 }, { "row_index": 0, "column_index": 2, "row_span": 1, "column_span": 1, "is_header": true, "is_projected_row_header": false, "value": "Growth (%)", "confidence_text": 0.97, "polygon_text": [[422, 272], [540, 272], [540, 294], [422, 294]], "polygon": [[392, 260], [552, 260], [552, 310], [392, 310]], "polygon_text_detector": [[422, 272], [540, 272], [540, 294], [422, 294]], "list_value_text": ["Growth (%)"], "confidence_polygon": 0.96 } ] } }, "id": null } ] } ] } }

Request ID

An associated request identifier is generated for every request made to this endpoint. This value can be found in the response headers under Request-Id


Responses

Various responses for this endpoint, in addition to general responses specified in Errors.

200 - OK

Request with a readable document image

Response

1
2
3
4
5
{ "status": "SUCCESS", "reason": "File Successfully Read", //..., }

400 - Bad Request

Request without form-data image

Response

1
2
3
4
5
{ "status": "NO_FILE", "reason": "No file in request body", //..., }

415 - Unsupported Media Type

Request with non-image file format

Response

1
2
3
4
5
{ "status": "FILE_INVALID_FORMAT", "reason": "Failed to process invalid file format. Please upload the correct file format", //..., }