Monday, March 19, 2012

help a newbie

i've been a sql server dba for the last six years, but we're just now
going to implement full text searches so i've got some dumb questions.
1) is the only way to make the fulltext searches realtime, ie document
is indexed immediately after being added to the db, is to enable change
tracking and update index in the background and add a timestamp column
to each table to be indexed?
2) once the change tracking is working, is there any reason to schedule
full or incremental population?
3) my catalog has 4 tables in it. is there any way to enable change
tracking and update index in background for the entire catalog or do i
have to go to each table and enable it?
4) is it better to have one catalog per db or one catalog per table?
5) the population really kills my cpu. my development server has dual
p3 at 500mhz and 1 gig of ram. the test db has about 500mb of
text/image data (pdf, doc, ppt, etc) in it. the population takes about
10 minutes to run and the mssdmn process stays at a minimum of 50% cpu.
is this normal? this goes along with #3 because everytime i go enable
change tracking on a table, it starts a repopulation.
6) what's more important for populating fulltext indexes, cpu speed or
disk speed?
7) people can upload any kind of document into these 4 tables including
jpg, gif, bmp. fulltext tries to index these and generates an error in
eventlog. any way to tell it to not index files of those types?
8) backup and recovery is very important here, as it should be
everywhere. what's the best way to restore a db that has fulltext
indexes? restore from backup and rebuild catalogs or try to restore the
catalog files along with the db?
CH,
See answers to your questions inline below starting with [jtkane].
Regards,
John
"ch" <ch@.dontemailme.com> wrote in message
news:408531A7.E2AF907A@.dontemailme.com...
> i've been a sql server dba for the last six years, but we're just now
> going to implement full text searches so i've got some dumb questions.
> 1) is the only way to make the fulltext searches realtime, ie document
> is indexed immediately after being added to the db, is to enable change
> tracking and update index in the background and add a timestamp column
> to each table to be indexed?
[jtkane] - Yes and assuming you're using SQL Server 2000 as CT and UIiB are
new features in SQL Server 2000. Although adding a timestamp column is not
required for CT and UIiB, but recommended as when you enable a populated FT
Catalog, CT will automaticlly run an Incremental Population to re-sync the
table with the FT Catalog, otherwise a Full Population will be executed.

> 2) once the change tracking is working, is there any reason to schedule
> full or incremental population?
[jtkane] - A Full Population, no, unless you have a need to change the
related noise word file, i.e., add or remove noise words. As for
Incremental, yes, depending upon the amount of change, i.e,
inserts/updates/deletes that affect over 60% of the rows in the FT enabled
table. See SQL Server BOL title "Maintaining Full-Text Indexes" for more
info on this.

> 3) my catalog has 4 tables in it. is there any way to enable change
> tracking and update index in background for the entire catalog or do i
> have to go to each table and enable it?
[jtkane] - CT and UIiB are set at the table level, so you would need to
enable each table. Note, this can be easly done in the Enterprise Manager.
However, considering the answer to number 2 above, and if these are large
(>1 million rows) tables, you should do enable these tables one at a time,
until the re-sync Incremental Population has completed and then enable the
next table...

> 4) is it better to have one catalog per db or one catalog per table?
[jtkane] - The answer to this question is that it depends. See the last
paragraph in SQL Server BOL title "Full-text Search Recommendation" for more
info on this question. Additionally, having mutiple tables in one FT
Catalogs can affect the value of RANK, if you use this column in your
CONTAINSTABLE or FREETEXTTABLE queries.

> 5) the population really kills my cpu. my development server has dual
> p3 at 500mhz and 1 gig of ram. the test db has about 500mb of
> text/image data (pdf, doc, ppt, etc) in it. the population takes about
> 10 minutes to run and the mssdmn process stays at a minimum of 50% cpu.
> is this normal? this goes along with #3 because everytime i go enable
> change tracking on a table, it starts a repopulation.
[jtkane] - Yes. This is is normal and expected during either the "shadow
merge" or "Master Merge" processes that the MSSearch service does to merge
new "word lists" into it's file system and then at the end of this process
or at midnight (controllable via a registry key). This process occurs during
either a Full or Incremental Population and at midnight for CT & UIiB
enabled tables.
For multi-proc servers, you can set the CPU affinity of the MSSearch service
(or any other service) via launching the following at the AT command prompt
as the sysadmin of the machine where SQL Server is installed using the
following syntax:
at <current_time_plus_1_min> /interactive taskmgr.exe
then once the TaskMgr is running, right-click on the MSSearch service and
set the CPU to one CPU and then use sp_configure and set the CPU affinity
for SQL Server to the other CPU for your dual proc server. This will ensure
that the MSSearch CPU usage will not affect your SQL Server processes during
it's high CPU usage periods.

> 6) what's more important for populating fulltext indexes, cpu speed or
> disk speed?
[jtkane] - All are important, depending what you are trying to optmize. See
SQL Server BOL title "Full-text Search Recommendation" for more info on this
issue. Note, the amount of L3 cache on the CPU is also important.

> 7) people can upload any kind of document into these 4 tables including
> jpg, gif, bmp. fulltext tries to index these and generates an error in
> eventlog. any way to tell it to not index files of those types?
[jtkane] - Yes. All errors, warnings and informational events for FTS and
MSSearch are recorded in the server's Appliation event log and not SQL
Server's errorlog files.

> 8) backup and recovery is very important here, as it should be
> everywhere. what's the best way to restore a db that has fulltext
> indexes? restore from backup and rebuild catalogs or try to restore the
> catalog files along with the db?
[jtkane] - Yes, I agree. For more info on backing up and restoring FT
Catalogs, see KB article 240867 (Q240867) "INF: How to Move, Copy, and
Backup Full-Text Catalog Folders and Files" at
http://support.microsoft.com/default...b;EN-US;240867
Note, SQL Server 2005 (codename Yukon) will fully integrate FT Catalogs into
SQL Server's backup and restore commands.

>
>

No comments:

Post a Comment