JobId: #1400464Text Extraction from Multiple Document Formats |
|
| Categories: Java,Software Related (Includes Websites) | |
| Posted By: kdawson (0 ratings) | |
Source:
|
| Job viewed: 356 times | ||
| Bids Placed: 0 | ||
| Start Date: 2010-04-30 08:48:50 | ||
| End Date: 2010-05-14 00:18:17 | ||
| Time Remaining: | ||
| Deadline: | ||
| Current Phase: Bidding open | ||
| Max Accepted Bid: Open to fair suggestions | ||
| Project Type: | ||
| Bid Type: Open Auction | ||
|
Brief Summary: We require a skillful Java programmer to enhance an existing stand-alone component that is capable of extracting text from different document formats. The text extraction component is a framework comprised of three projects: interfaces, implementation and tests. The current requirements consist of: - improving text extraction techniques for a given narrative or content body; - enhancing the handling of indexed and/or tabular data within a narrative; - extending the capture of metadata, including content organization; - integrating OCR functionality; - etc. In order to be considered for this task, providers should have some experience with text extraction, including but not limited to wrapper induction techniques, open source libraries, MUC-style template processing. Preference will be given to those which show familiarity with several text presentation standards ODF, OOXML, PDF, HTML, and/or MS OLE-2. Although interfaces have been defined, there is a great deal of scope for both creativity and ingenuity in the implementation. Component wiring is based on the SpringFramework. Performance is based on jUnit tests. ![]() To help you bid more accurately, the buyer was interviewed about the requirements for this bid request. Below are their answers. (Note: Like everything else on this page, these categories are part of the original contract for this bid request.) Languages, Java, Software related (includes websites) |

Freelancer Jobs - Archive 2007
