Chapter 6 Querying Full-Text Data Flashcards

1
Q

What is full-text search?

A

Full-text search enables you to run full-text queries against character-based data in SQL Server tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the prerequisites before you start using full-text predicates and functions?

A

You must create full-text indexes inside full-text catalogs. The full-text indexes are on character-based columns in your database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does full-text search allow you to search for?

A

(1) Simple Terms - One or more specific words or phrases (2) Prefix Terms - Terms the words or phrases begin with (3) Generation Terms - Inflectional forms of the words (4) Proximity Terms - words or phrases close to another word or phrase (5) Thesaurus Terms - synonyms of a word (6) Weighted Terms - words or phrases that use custom weight values (7) Statistical Semantic Search - Key phrases in a document (8) Similar Documents - similarity defined by key phrases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

On what types of columns can you create full-text indexes?

A

CHAR, VARCHAR, NCHAR, NVARCHAR, TEXT, NTEXT, IMAGE, XML, VARBINARY(MAX)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What do you need in order to use full-text search?

A

(1) Check whether the Full-Text Search is installed: SELECT SERVERPROPERTY(‘IsFullTextInstalled’) (2) Full-text indexes must be created on character data or document (binary/image/xml) columns (3) For documents, you need appropriate filters (ifilters) that allow you to extract the textual information and remove the formatting from the documents. You can check which filters are installed: EXEC sys.sp_help_fulltext_system_components ‘filter’. (4) After you install the filter pack, register the filters: EXEC sys.sp_fulltext_service ‘load_os_resources’, 1; (5) You may need to restart SQL Server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a word breaker?

A

Identifies individual words or tokens. Tokens are inserted in a full-text index in compressed format. Word breakers are language specific.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a stemmer?

A

Stemmers generate inflectional forms of a word based on the rules of a language. Stemmers are language specific.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are stoplists?

A

You can prevent indexing “noise” words by creating stoplists of stopwords. This helps to prevent bloating a full-text index with words that don’t help your searches.

You can check current stoplists and stopwords using:

SELECT stoplist_id, names
FROM sys.fulltext_stoplists;

SELECT stoplist_id, stopword, language
FROM sys.fulltext_stopwords;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a thesaurus file?

A

Full-text queries can search not only for words you provide in a query, they can search for synonyms as well. SQL Server finds synonyms in thesaurus files. Each language has an associated XML thesaurus file. You can edit the thesaurus files and configure the following elements: (1) diacritics_sensitive - 0 accent insensitive/1 accent sensitive, (2) expansion - expansion word “author” to “writer” in order to link the two, (3) replacement - “Windows 2008” could be a replacement for “Win 2k8”. After you edit the thesaurus file for a specific language, you must load it with the following stored procedure call: EXEC sys.sp_fulltext_load_thesaurus_file 1033; (1033 is the language id - US-EN).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can full-text queries search on document properties?

A

Yes. Which properties can be searched depends on the document filter. You can create a search property list to define searchable properties for you documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Where are full-text indexes stored?

A

Full-text indexes are stored in full-text catalogs. A full-text catalog is a virtual object, a container for full-text indexes. As a virtual object it does not belong to any filegroup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can you store indexes from the same full-text catalog to different filegroups?

A

Yes. A full-text catalog is a virtual object only; full-text indexes are physical objects. You can store each full-text index from the same catalog to a different file group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is statistical semantic search?

A

It gives you deeper insight into documents by extracting and indexing statistically relevant key phrases. Full-text search uses these key phrases to identify and index documents that are similar or related.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which database do you have to install in order to enable the Semantic Search feature?

A

You need the semanticsdb database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you create synonyms for the words searched?

A

You can add synonyms by editing the thesaurus file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two predicates that can be used with full-text indexes?

A

CONTAINS and FREETEXT

17
Q

What is the CONTAINS predicate?

A

You can use the CONTAINS predicate in your WHERE clause to search for the following: (1) words or phrases in text, (2) exact or fuzzy matches, (3) inflectional forms of a word, (4) text in which a search word is close to another word, (5) synonyms of a searched word, (6) a prefix of a word or phrase. Also, you can add custom weights to the words you’re searching for.

18
Q

What is the form of CONTAINS to find an exact match - a simple term?

A

WHERE CONTAINS(FTcolumn, ‘SearchWord1’)

19
Q

What is the form of CONTAINS that uses logical operators?

A

WHERE CONTAINS(FTcolumn, ‘SearchWord1 OR SearchWord2’). You can also use AND and AND NOT logical operators and change the order of eval using parenthesis.

20
Q

What is the form of CONTAINS that finds an exact match for a phrase?

A

WHERE CONTAINS(FTcolumn, ‘SearchWord1 SearchWord2’)

21
Q

What is the form of CONTAINS that finds a word that starts with a certain prefix?

A

WHERE CONTAINS(FTcolumn, ‘SearchWord1*’)

22
Q

What is the form of CONTAINS where the FTcolumn contains SearchWord1 and SearchWord2? The simplest custom proximity term. Search for the occurrence of both words no matter what the distance and order of the terms.

A

WHERE CONTAINS(FTcolumn, ‘NEAR(SearchWord1, Searchword2)’) - Note that the order of the search terms is not important.

23
Q

What is the form of CONTAINS where the FTcolumn contains SearchWord1 and SearchWord2 and specifies how many nonsearch terms can maximally be between the searched terms?

A

WHERE CONTAINS(FTcolumn, ‘NEAR((SearchWord1, SearchWord2), distance)’) - Note that order of the search terms is not important.

24
Q

What is the form of CONTAINS where the FTcolumn contains SearchWord1 and SeachWord2 and specifies the max distance between the two search terms, but also takes a flag to indicate if the order of the search terms is important?

A

WHERE CONTAINS(FTcolumn, ‘NEAR((SearchWord1, SearchWord2), distance, flag)’); The flag can take TRUE or FALSE; the default is FALSE. If flag is TRUE, then the order of the search terms is important.

25
Q

What is the form of CONTAINS where any inflectional forms of SearchWord1 are found?

A

WHERE CONTAINS(FTcolumn, ‘FORMSOF(INFLECTIONAL, SearchWord1)’)

26
Q

What is the form of CONTAINS where synonyms for SearchWord1 are found?

A

WHERE CONTAINS(FTcolumn, ‘FORMSOF(THESAURUS, SearchWord1)’) - Note that the synonyms will come from the thesaurus file.

27
Q

What is the form of CONTAINS where the terms are weighted?

A

WHERE CONTAINS(FTcolumn, ‘ISABOUT(SearchWord1 weight(w1), SearchWord2 weight(w2))’) - Note that the weights have an influence on the rank of the documents returned. CONTAINS does not rank, but the weighted form is useful for the CONTAINSTABLE function.

28
Q

What is the form of CONTAINS allowing you to search a document’s properties?

A

WHERE CONTAINS(PROPERTY(FTcolumn, ‘PropertyName’), ‘SearchWord1’) - Note that you’re searching for rows with documents that have the property PropertyName that contains the value SearchWord1.

29
Q

What is the FREETEXT predicate?

A

The FREETEXT predicate is less specific and thus returns more rows than the CONTAINS predicate. With FREETEXT, the engine performs word breaking of the search phrase, generates inflectional forms (does the stemming), and identifies a list of expansions or replacements for the words in the searched term with words from the thesaurus., e.g. FREETEXT(FTcolumn, ‘SearchWord1 SearchWord2’) - here, you’re searching for rows where FTcolumn includes any inflectional forms, synonyms, etc.

30
Q

Which predicate CONTAINS or FREETEXT is more specific?

A

You use the CONTAINS predicate for more specific searches. FREETEXT can be used for more general searches.

31
Q

What are the two full-text table-valued functions?

A

CONTAINSTABLE and FREETEXTTABLE

32
Q

What are the three semantic search table-valued functions?

A

SEMANTICKEYPHRASETABLE (semantic key phrase table), SEMANTICSIMILARITYTDETAILSTABLE (semantic similarity details table), and SEMANTICSIMILARITYTABLE (semantic similarity table).

33
Q

What do CONTAINSTABLE and FREETEXTTABLE return?

A

They return two columns: KEY and RANK. The KEY column is the unique key from the index used in the KEY INDEX clause of the CREATE FULLTEXT INDEX statement. RANK returns an ordinal value between 0 and 1000. This is the rank value. It tells you how well a row matches your search criteria. There is a different rank calculation between the two functions since the latter does not support the majority of the parameters that the first one does like proximity and weight.

34
Q

What is the form of the CONTAINSTABLE function?

A

CONTAINSTABLE ( table , { column_name | ( column_list ) | * } , ‘ ‘
[ , LANGUAGE language_term]
[ , top_n_by_rank ] )

35
Q

What is the form of the FREETEXTTABLE function?

A

FREETEXTTABLE (table , { column_name | (column_list) | * }
, ‘freetext_string’
[ , LANGUAGE language_term ]
[ , top_n_by_rank ] )

36
Q

For the CONTAINSTABLE and FREETEXTTABLE functions, what does the top_n_by_rank parameter do?

A

The top_n_by_rank parameter is an integer that specifies that only the n rows with the highest rank should be returned in the rowset.

37
Q

For the CONTAINSTABLE and FREETEXTTABLE functions, what does the language_term parameter do?

A

This is the language SQL Server uses for word breaking, stemming, thesaurus, and stopword removal as part of the query. If no value is specified, the column’s full-text language is used. This parameter could be useful when you store documents of different languages in a single column. When you query such a column, language_term can increase the quality of matches. You specify it as an integer, representing the LCID or local identifier.

38
Q

What is the form of the SEMANTICKEYPHRASETABLE function?

A
SEMANTICKEYPHRASETABLE
(
   table,
   { column | (column_list) | * }
   [ , source_key ]
)
39
Q

What is the SEMANTICKEYPHRASETABLE function?

A

This function returns a table with key phrases associated with the full-text indexed column from the column_list.