Type: Brevetto di invenzione industriale
Title: Object extraction from presentation-oriented documents using a semantic and spatial approach
Year of publication: 2017
Kind of patent: Internazionale
Number: US9582494 B2
Country: US
Language: inglese
Abstract: Automatic extraction of objects in a presentation-oriented document comprises receiving the presentation-oriented document (POD) in which content elements are spatially arranged in a given layout organization for presenting contents to human users; receiving a set of descriptors that semantically define the objects to extract from the POD based on attributes comprising the objects; using the set of descriptors to identify content elements in the POD that match the attributes in the set of descriptors defining the objects, and assigning semantic annotations to the identified elements based on the descriptors; creating a semantic and spatial document model (SSDM) containing spatial structures of the identified content elements in the POD and the semantic annotations assigned to the identified contents elements; extracting the identified content elements from the POD based on the set of descriptors and the SSDM to create a set of object instances; and performing at least one of: i) using the object instances to generate semantic and spatial wrappers that can be reused on a different POD, and ii) storing the object instances in a data repository.
Key words:
- Information Extraction
- Presentation-oriented document
- semantic method
- spatial approach