1/5/2016 5:02:52 PM

SAP HANA Text Analysis

Text Analysis is the process of analyzing unstructured text, extracting relevant information and then transforming that information into structured information that can be leveraged in different ways.

There are few important techniques being used in Text Analysis.

Full Text Search

Full Text Indexing

Fuzzy Search

Full Text Search:

Full text search is designed to perform linguistic (language-based) searches against text and documents stored in your database. 

In a full-text search, the search engine examines all of the words in every stored document as it tries to match search criteria (text specified by a user). 

Full Text Indexing:

When dealing with a small number of documents, it is possible for the full-text-search engine to directly scan the contents of the documents with each query, a strategy called "serial scanning." This is what some rudimentary tools, such as grep, do when searching. 

However, when the number of documents to search is potentially large, the problem of full-text search is often divided into two tasks: indexing and searching. 

The indexing stage will scan the text of all the documents and build a list of search terms (often called an index). In the search stage, when performing a specific query, only the index is referenced, rather than the text of the original documents. 

The indexer will make an entry in the index for each term or word found in a document, and possibly note its relative position within the document. 

Conceptually, full-text indexes support searching on columns in the same way that indexes support searching through books. 

Fuzzy Search:

Also known as approximate string matching. 

Fuzzy search is the technique of finding strings that match a pattern approximately (rather than exactly). 

It is a type of search that will find matches even when users misspell words or enter in only partial words for the search. 

If you like this blog, please share (Facebook/LinkedIn/Google+) to click below links so it will reach to others.