ASP 101 - Active Server Pages 101 - Web04
The Place ASP Developers Go!

Please visit our partners


Windows Technology Windows Technology
15 Seconds
4GuysFromRolla.com
ASP 101
ASP Wire
VB Forums
VB Wire
WinDrivers.com
internet.commerce internet.commerce
Partners & Affiliates














ASP 101 is an
internet.com site
ASP 101 is an internet.com site
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

ASP 101 News Flash ASP 101 News Flash


 Top ASP 101 Stories Top ASP 101 Stories
VBScript Classes: Part 1 of N
Migrating to ASP.NET
Getting Scripts to Run on a Schedule

QUICK TIP:
Variable Name Prefixes
Show All Tips >>
ASP 101 RSS Feed ASP 101 Updates


Using Index Server to Search Your Web Site - Noise Words

by John Peterson

Introduction

Since writing my article on using Microsoft's Index Server from ASP (Part 1, Part 2), I've gotten quite a few questions about why people aren't getting the results they expect. There are a number of reasons why this might happen, but one of the most common is that your query includes one or more words that index server considers "noise words". This article will explain what noise words are and show you how to edit the list of words that index server treats as noise words.

The Email

You can thank Yu Zhang for finally getting me to write this article. I've answered quite a few questions about noise words, but his came at the issue from a little bit different angle. Here's his email:

Hi John,

Your article "Using Index Server to Search Your Web Site", helped me a lot, but, I'm having a problem with 'reserved words' such as 'i' and 'about'. To deal with this problem, I need to find the list of Index Server's reserved words so I can filter them.

Do you know where to find the list?

Thanks,
Yu

As it turns out, I didn't know where to find the list... so, I found out. Having taken the time to do so, I figured I should share the info with everyone.

What are Noise Words?

Noise words are words that are very common and yet have very little meaning. Words like 'a', 'an', 'the', 'to', 'so', 'with', etc. are found in almost all documents but provide very little information about the actual meaning of the document. Therefore, there is very little value to be gained from knowing that a document contains any of them. Because of this, Index Server is designed to ignore these type of words when it builds an index from a set of documents.

So, to answer Yu's question from above, you can find the list of all the words that Index Server considers noise words in the System32 folder of your Windows directory. There you'll find a bunch of files named noise.xxx, where xxx represents the language in question. For US English, the file name is noise.enu. On my laptop, the complete path to this file is C:\Windows\System32\noise.enu. The file is a plain text file and you can open and edit it using the text editor of your choice (Windows' Notepad works fine).

Editing the List of Noise Words

So why would you want to add or remove a word? Let's say your site is named "ASP 101" and every page title includes the phrase "ASP 101". In that case, searching for "ASP" might be pretty pointless since it would return every single document and that really sort of defeats the point of searching for something now doesn't it? To avoid this problem, we might want to add "ASP" and "101" to the list of noise words so that Index Server would ignore them while indexing and produce a smaller index and provide faster search results. It would also prevent users from searching for "ASP" and getting back an unmanageable set of results.

Editing the noise word list is basically as simple as editing the text file. As always, you should make a backup copy before you do so and there are a few other caveats, but they are all discussed in Microsoft's Knowledge Base Article #247561 - How to Edit Index Server Noise-Word Lists so I won't go into them here.

That's All Folks

I hope this article has helped shed some light on the topic of noise words for all of you using Index Server. And, keep on sending in those questions... someone has to tell me what you guys want to read about.

As an aside... I just love how much support Microsoft gives Index Server. Check out all the information at the Index Server Support Center. I realize it's not their flagship product or anything, but come on guys... give us something!

Related Articles


Home |  News |  Samples |  Articles |  Lessons |  Resources |  Forum |  Links |  Search |  Feedback

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info

Legal Notices, Licensing, Reprints, Permissions, Privacy Policy.
Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Whitepapers and eBooks

Intel Whitepaper: Comparing Two- and Four-Socket Platforms for Server Virtualization
IBM Solutions Brief: Go Green With IBM System xTM And Intel
HP eBook: Simplifying SQL Server Management
IBM Contest: Are You the Next Superstar? Join the "Search for the XML Superstar" Contest to Find Out
Microsoft PDF: Top 10 Reasons to Move to Server Virtualization with Hyper-V
Microsoft PDF: Six Reasons Why Microsoft's Hyper-V Will Overtake Vmware
Microsoft Step-by-Step Guide: Hyper-V and Failover Clustering
Intel PDF: Quad-Core Impacts More Than the Data Center
Intel PDF: Virtualization Delivers Data Center Efficiency
Go Parallel Article: PDC 2008 in Review
Microsoft PDF: Top 11 Reasons to Upgrade to Windows Server 2008
Avaya Article: Communication-Enabled Mashups: Empowering Both Business Owners and IT
Intel Whitepaper: Building a Real-World Model to Assess Virtualization Platforms
  PDF: Intel Centrino Duo Processor Technology with Intel Core2 Duo Processor
Microsoft Article: Build and Run Virtual Machines with Hyper-V Server 2008
Go Parallel Article: Q&A with a TBB Junkie
IBM Whitepaper: Innovative Collaboration to Advance Your Business
Internet.com eBook: Real Life Rails
IBM eBook: The Pros and Cons of Outsourcing
Internet.com eBook: Best Practices for Developing a Web Site
IBM CXO Whitepaper: The 2008 Global CEO Study "The Enterprise of the Future"
Avaya Article: Call Control XML in Action - A CCXML Auto Attendant
IBM CXO Whitepaper: Unlocking the DNA of the Adaptable Workforce--The Global Human Capital Study 2008
Adobe Acrobat Connect Pro: Web Conferencing and eLearning Whitepapers
HP eBook: Guide to Storage Networking
MORE WHITEPAPERS, EBOOKS, AND ARTICLES