Chapter 6 Querying Full-Text Data Flashcards Preview

70-461 > Chapter 6 Querying Full-Text Data > Flashcards

Flashcards in Chapter 6 Querying Full-Text Data Deck (39)
Loading flashcards...

What is full-text search?

Full-text search enables you to run full-text queries against character-based data in SQL Server tables.


What are the prerequisites before you start using full-text predicates and functions?

You must create full-text indexes inside full-text catalogs. The full-text indexes are on character-based columns in your database.


What does full-text search allow you to search for?

(1) Simple Terms - One or more specific words or phrases (2) Prefix Terms - Terms the words or phrases begin with (3) Generation Terms - Inflectional forms of the words (4) Proximity Terms - words or phrases close to another word or phrase (5) Thesaurus Terms - synonyms of a word (6) Weighted Terms - words or phrases that use custom weight values (7) Statistical Semantic Search - Key phrases in a document (8) Similar Documents - similarity defined by key phrases.


On what types of columns can you create full-text indexes?



What do you need in order to use full-text search?

(1) Check whether the Full-Text Search is installed: SELECT SERVERPROPERTY('IsFullTextInstalled') (2) Full-text indexes must be created on character data or document (binary/image/xml) columns (3) For documents, you need appropriate filters (ifilters) that allow you to extract the textual information and remove the formatting from the documents. You can check which filters are installed: EXEC sys.sp_help_fulltext_system_components 'filter'. (4) After you install the filter pack, register the filters: EXEC sys.sp_fulltext_service 'load_os_resources', 1; (5) You may need to restart SQL Server.


What is a word breaker?

Identifies individual words or tokens. Tokens are inserted in a full-text index in compressed format. Word breakers are language specific.


What is a stemmer?

Stemmers generate inflectional forms of a word based on the rules of a language. Stemmers are language specific.


What are stoplists?

You can prevent indexing "noise" words by creating stoplists of stopwords. This helps to prevent bloating a full-text index with words that don't help your searches.

You can check current stoplists and stopwords using:

SELECT stoplist_id, names
FROM sys.fulltext_stoplists;

SELECT stoplist_id, stopword, language
FROM sys.fulltext_stopwords;


What is a thesaurus file?

Full-text queries can search not only for words you provide in a query, they can search for synonyms as well. SQL Server finds synonyms in thesaurus files. Each language has an associated XML thesaurus file. You can edit the thesaurus files and configure the following elements: (1) diacritics_sensitive - 0 accent insensitive/1 accent sensitive, (2) expansion - expansion word "author" to "writer" in order to link the two, (3) replacement - "Windows 2008" could be a replacement for "Win 2k8". After you edit the thesaurus file for a specific language, you must load it with the following stored procedure call: EXEC sys.sp_fulltext_load_thesaurus_file 1033; (1033 is the language id - US-EN).


Can full-text queries search on document properties?

Yes. Which properties can be searched depends on the document filter. You can create a search property list to define searchable properties for you documents.


Where are full-text indexes stored?

Full-text indexes are stored in full-text catalogs. A full-text catalog is a virtual object, a container for full-text indexes. As a virtual object it does not belong to any filegroup.


Can you store indexes from the same full-text catalog to different filegroups?

Yes. A full-text catalog is a virtual object only; full-text indexes are physical objects. You can store each full-text index from the same catalog to a different file group.


What is statistical semantic search?

It gives you deeper insight into documents by extracting and indexing statistically relevant key phrases. Full-text search uses these key phrases to identify and index documents that are similar or related.


Which database do you have to install in order to enable the Semantic Search feature?

You need the semanticsdb database.


How can you create synonyms for the words searched?

You can add synonyms by editing the thesaurus file.


What are the two predicates that can be used with full-text indexes?



What is the CONTAINS predicate?

You can use the CONTAINS predicate in your WHERE clause to search for the following: (1) words or phrases in text, (2) exact or fuzzy matches, (3) inflectional forms of a word, (4) text in which a search word is close to another word, (5) synonyms of a searched word, (6) a prefix of a word or phrase. Also, you can add custom weights to the words you're searching for.


What is the form of CONTAINS to find an exact match - a simple term?

WHERE CONTAINS(FTcolumn, 'SearchWord1')


What is the form of CONTAINS that uses logical operators?

WHERE CONTAINS(FTcolumn, 'SearchWord1 OR SearchWord2'). You can also use AND and AND NOT logical operators and change the order of eval using parenthesis.


What is the form of CONTAINS that finds an exact match for a phrase?

WHERE CONTAINS(FTcolumn, 'SearchWord1 SearchWord2')


What is the form of CONTAINS that finds a word that starts with a certain prefix?

WHERE CONTAINS(FTcolumn, 'SearchWord1*')


What is the form of CONTAINS where the FTcolumn contains SearchWord1 and SearchWord2? The simplest custom proximity term. Search for the occurrence of both words no matter what the distance and order of the terms.

WHERE CONTAINS(FTcolumn, 'NEAR(SearchWord1, Searchword2)') - Note that the order of the search terms is not important.


What is the form of CONTAINS where the FTcolumn contains SearchWord1 and SearchWord2 and specifies how many nonsearch terms can maximally be between the searched terms?

WHERE CONTAINS(FTcolumn, 'NEAR((SearchWord1, SearchWord2), distance)') - Note that order of the search terms is not important.


What is the form of CONTAINS where the FTcolumn contains SearchWord1 and SeachWord2 and specifies the max distance between the two search terms, but also takes a flag to indicate if the order of the search terms is important?

WHERE CONTAINS(FTcolumn, 'NEAR((SearchWord1, SearchWord2), distance, flag)'); The flag can take TRUE or FALSE; the default is FALSE. If flag is TRUE, then the order of the search terms is important.


What is the form of CONTAINS where any inflectional forms of SearchWord1 are found?



What is the form of CONTAINS where synonyms for SearchWord1 are found?

WHERE CONTAINS(FTcolumn, 'FORMSOF(THESAURUS, SearchWord1)') - Note that the synonyms will come from the thesaurus file.


What is the form of CONTAINS where the terms are weighted?

WHERE CONTAINS(FTcolumn, 'ISABOUT(SearchWord1 weight(w1), SearchWord2 weight(w2))') - Note that the weights have an influence on the rank of the documents returned. CONTAINS does not rank, but the weighted form is useful for the CONTAINSTABLE function.


What is the form of CONTAINS allowing you to search a document's properties?

WHERE CONTAINS(PROPERTY(FTcolumn, 'PropertyName'), 'SearchWord1') - Note that you're searching for rows with documents that have the property PropertyName that contains the value SearchWord1.


What is the FREETEXT predicate?

The FREETEXT predicate is less specific and thus returns more rows than the CONTAINS predicate. With FREETEXT, the engine performs word breaking of the search phrase, generates inflectional forms (does the stemming), and identifies a list of expansions or replacements for the words in the searched term with words from the thesaurus., e.g. FREETEXT(FTcolumn, 'SearchWord1 SearchWord2') - here, you're searching for rows where FTcolumn includes any inflectional forms, synonyms, etc.


Which predicate CONTAINS or FREETEXT is more specific?

You use the CONTAINS predicate for more specific searches. FREETEXT can be used for more general searches.