Login Form






Lost Password?
No account yet? Register
Freelancer Jobs arrow Freelancer Jobs - Archive 2007
JobId: #1400464

Text Extraction from Multiple Document Formats

Categories: Java,Software Related (Includes Websites)
Posted By: kdawson  (0 ratings)
Source: Rent-a-coder
Job viewed: 356 times
Bids Placed: 0
Start Date: 2010-04-30 08:48:50
End Date: 2010-05-14 00:18:17
Time Remaining:
Deadline:
Current Phase: Bidding open
Max Accepted Bid: Open to fair suggestions
Project Type:
Bid Type: Open Auction
Brief Summary:

We require a skillful Java programmer to enhance an existing stand-alone component that is capable of extracting text from different document formats. The text extraction component is a framework comprised of three projects: interfaces, implementation and tests. The current requirements consist of:

- improving text extraction techniques for a given narrative or content body;
- enhancing the handling of indexed and/or tabular data within a narrative;
- extending the capture of metadata, including content organization;
- integrating OCR functionality;
- etc.

In order to be considered for this task, providers should have some experience with text extraction, including but not limited to wrapper induction techniques, open source libraries, MUC-style template processing. Preference will be given to those which show familiarity with several text presentation standards ODF, OOXML, PDF, HTML, and/or MS OLE-2. Although interfaces have been defined, there is a great deal of scope for both creativity and ingenuity in the implementation. Component wiring is based on the SpringFramework. Performance is based on jUnit tests.

Requirements Interview Answers:
To help you bid more accurately, the buyer was interviewed about the requirements for this bid request. Below are their answers. Untitled Page
Bid request Type: What kind of work do you need done?
Software related (Includes desktop applications and internet websites)
Bid request Parts: What do you want the worker to do on this bid request?
Programming: The worker will take the requirements and translate them into the language of the computer (and test it).
Program Type: What kind of software should the worker create (and/or install)?
  • A desktop or server program: This software runs on a user's own PC/workstation, or on a server.
Desktop / server program info
Size of application: How many screens/forms need to be created/edited in this application?
Exactly 0.
Programming Language: What programming language(s) do you want your application written in?
I do know the language(s).
Languages(s):
  • Java
Misc. details: I think that it would be convenient to know a number of string algorithms and to be proficient with regex

Operating system(s) What operating systems(s) do you want your application to work on?
I don't know (and need the worker's assistance to suggest it).
Details: There should be no OS dependencies.  This should be managed by the Java Virtual Machine (1.6)
Database: Will this bid request include a database?
No, it does not include a database.
Legal: 1) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased.
1b) No part of the deliverable may contain any copyright restricted 3rd party components (including GPL, GNU, Copyleft, etc.) unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the worker's Worker Legal Agreement.
Other Requirements:
Categories:
(Note: Like everything else on this page, these categories are part of the original contract for this bid request.)
Languages, Java, Software related (includes websites)