Posts

License plate B WELL

Search: Wildcarding Front-to-Back and Back-to-Front

I was recently working with a client where I was evaluating an implementation, and some of the members of the team who inherited a solution were concerned about an implementation of wildcarding. It turns out the implementation was correct – but the concerns that the client had were common. So let’s take a look at what it takes to do search wildcarding Front-to-Back and Back-to-Front.

What is Wildcarding?

Before we get too far, it’s important to explain what I mean by wildcarding. The short answer is that we’re looking for the pattern of characters provided anywhere in the text being searched. In most cases, when we’re doing this searching, we’re doing it from the start of a word or words – because, in truth, our brains work this way; but occasionally there are times when it might make sense to search for the characters beginning anywhere in a word. Algorithmically, this is a more difficult challenge to solve. As a result, most search engines support wildcarding only front-to-back instead of anywhere in a string.

To understand the algorithmic problem, it’s helpful to view a simplified view of what SQL has to do to solve the wildcarding problem in a single field.

SQL Wildcarding

SQL has supported both forward and backwards wildcarding through the LIKE keyword for some time, so it’s common to assume it should “just work” in search as well. However, some sorts of wildcard operations in SQL are very operationally expensive. Let’s assume we’ve got a database table named Books and it has a field named Title. If I don’t have any indexes on the Title field and I use Title in the WHERE clause of the SQL statement, SQL will perform a full table scan of the table. Operationally, doing full table scans are expensive, and we work hard to prevent them in SQL. We do this by adding indexes.

If we add an index to the Title field, we get an ordered list of titles. With this information, if we’re searching for a specific title, we can start in the middle of the index and move forwards or backwards jumps (continuing to try to get quickly to the right place in the index) until we find the specific title we’re looking for. The index contains a row identifier in the main data table and we can read out the rest of the data we need from the main table quickly.

Simplifying out some optimizations that SQL does if we have a table of 100 records, without the index, SQL has to read 100 records to find the title that we’re looking for (and ensure there are no other matches). With an index on Title and a specific query, the maximum number of reads would be 14. Breaking this down, we can find any record in the list by bifurcating the list. 128 records is 27, or seven reads for the index. If we don’t have all the fields in the index that we need then we need to go back to the main table to get the actual record – so another seven reads (maximum). These are worst-case scenarios, and in many ways I’ve really over-simplified the impacts of caching, paging, row identifiers, etc., but the fundamentals are there so we can get a sense for the power of indexing.

This improvement, which gets larger as the data set gets larger, relies on the ability to order the results in the index. This in turn means that we have to at least know what the first characters are so we can look up the rest. That’s the rub. To get the efficiencies in looking up data we have to order it, and we can’t order it if we don’t know the start.

So what happens when you provide a wildcard at the end of the string in SQL – nothing special. It still uses the index and just walks across all the rows that could match. What happens when there is a wildcard at the beginning of the LIKE value is that SQL gives up and does a full table scan – unless there’s a covering index.

Sidebar: A covering index is one that contains all of the fields needed to satisfy the query. Even if the index’s order can’t be used, it will sometimes be used instead of the data table, because less data would be read and it would therefore be somewhat more efficient. In our example, SQL would use our index on the Title field presuming we only asked for the title field. It might use it if we asked for additional fields. However, there’s still a full scan of the data we’re interested in happening somewhere.

While the indexing approach that search uses is different than SQL, it still obeys some of the same rules. It puts things in order to find them quickly.

Wildcarding from the Front

When you search with a wildcard at the front, it’s really very similar to a search without wildcarding. It finds the appropriate bits in the index and does some post filtering for security and returns them. Search is expecting to return multiple results. It simply includes entries from the index which it would have ignored because of the end of the term.

Search is fast because of the indexing process that is done. This indexing process, while substantially more intensive than creating a SQL index due to the volume of data involved, follows the same general data management principles. Indexes start at the front.

Multiple Values

One of the improvements of search over SQL, from a data management perspective, is that search allows for a single property or field to have multiple values. This is appropriate because of fields like keyword fields, but also when multiple data fields are mapped to the same search property. For instance, the title of the document as well as a field in the data management system may be mapped to the same title property.

In SQL if you have a single field with multiple values, it gets indexed with the first value – which is why searching multiple valued fields in SQL is difficult, and why third form normalization pushes individual values into independent rows. Search is really managing the process that database designers do in SQL on its own. That’s a good thing. It gives us the opportunity to work around back-to-front wildcarding – for a subset of the properties. Let’s take a look at a license plate example to explain what we can do.

Partial Matching with License Plates

The classic data problem with wildcarding on both ends is the license plate match. The story is that a witness saw the license plate of a getaway vehicle, but unfortunately only managed to get three of the six digits on the license plate – and, more challenging, they don’t know which three digits they got. For simplicity, let’s say they observed the letters ABC. Those characters would match any of the following license plates:

A B C ? ? ?
? A B C ? ?
? ? A B C ?
? ? ? A B C

When the SQL database is set up with fields for each character, you can transform the query in a way that does each of these searches. The result is that SQL can use a set of indexes to solve the query very efficiently (presuming the indexes are correct).

We can’t do this in search, so we flip this approach over and instead of transforming the query, we transform the data – using the idea of multiple values.

Partial Matching and Back-to-Front Wildcarding with Search

If you look back in the table above, you may notice something. That is, if you were to progressively remove characters from the beginning of each license plate, you could check for a match. For instance, let’s say that the bad plate is actually ZZABCD. We would store in a property the following:

Z Z A B C D
Z A B C D
A B C D
B C D
C D
D

In this case, if you were to search for ABC with a wildcard at the beginning and the end, you would find a match. More specifically, the third value (from the third row in the table would match). So if you can transform the incoming values such that you store a set of values for the property with progressively more leadings characters stripped, the resulting property will be searchable with wildcards on both sides.

In short, by transforming a property, we can get the desired effect for a given property – with a few side effects.

Impacts of Partial and Back-to-Front Wildcarding

The first and perhaps most obvious impact of this property transformation is that it increases the amount of storage in the index. As long as the property itself is relatively small, this isn’t generally a big deal. However, it does mean that you wouldn’t necessarily want to do this on every property – or, more to the point, you don’t.

The second impact is that this is a strategy that works for specific properties but doesn’t work for the full text of search. This is generally OK, because the cases where you need it are limited – but it’s not a completely generalizable workaround.

Finally, there will be some impacts to ranking and relevance by doing this which are search engine-specific. It’s possible that, after implementing this strategy, you’ll have shifted the relevance of those searches which query this property.

For these reasons, it’s still a good idea to consider the exact reasons why you believe that back-to-front wildcarding is appropriate for you, and why it might be better to consider the psychology of searching.

Psychology

Except for relatively rare circumstances, our brains don’t work by picking out the middle of a string. We might be able to recall a segment of a license plate because it’s novel, but in most cases we simply don’t process information this way. We typically think of the start of a word or term, but don’t know how to put the rest of the word together. One exception to this is when we tokenize strings instead of processing them as words.

In many cases, the reason that we want to support back-to-front wildcarding is because the user did what happened in the license plate example – the license plate isn’t (generally) a word. It gets processed as groups of characters. One or more of the groupings may be accidentally or intentionally memorable, and the user doesn’t remember the rest of the string. For instance, in a part number like A#1264#CIRBRK#US, the CIRBRK portion of the string might be memorable and something someone would want to search on. In this case, the user isn’t really searching from an arbitrary starting point in the string, they’re starting from a breakpoint.

Breakpoints are where the string should naturally be broken. Search engines do this all the time with language to break the content into distinct words that can be searched for. This is controlled by word breakers.

Wordbreakers

Much of the problem that we’re facing, which is allowing users to search at the start of a word or a part of a longer string, has already been handled in the engine. Every search engine knows how to break strings into distinct words for indexing. What characters are used for breaking words can be language-dependent or set globally.

Some of the standard word breakers make sense. Consider carriage return, space, and tab are all obvious word breakers. However, depending upon the engine you’re using, hyphens, underscores, and other special characters may or may not be considered word breakers. If they’re not, then you get one long string of the value – if they are, it’s broken up into pieces.

Consider that the string you get for the part number: if # is not a breaker, the part number is the complete string. If # is a word breaker, the following gets indexed: A, 1264, CIRBRK, and US. In this case if I know that I’m looking for CIRBRK, it would match (as would CIRBRK with a wildcard at the end).

This is important because some implementations of back-to-front order aren’t necessary if the appropriate word breakers are in place. If the part number is A1264CIRBRKUS then you definitely need the back-to-front wildcarding approach described above. However, with separators, it’s more efficient to not transform the property. Like any rule there are exceptions.

Right to Left Exceptions

You may have considered that I’ve been speaking left-to-right as in most of the languages in use on the planet today. There are some languages which are processed right-to-left instead. In these cases, it’s easiest to think of the right-to-left read language having the characters flipped (inverted), so the last character comes first and the first becomes last. If you do this, then the characteristics are all the same. People in right-to-left languages tend to remember the right side (start) of the word not the left side (end). The psychology matches even if the symbols are reversed.

forge

Retro: Harness Properties in SharePoint Search

Originally published on 8/21/2005

Most users of SharePoint Portal Server rapidly become enamored with the ability to add new fields (containing meta data) to documents in the document library. All of the sudden it becomes possible to associate information to a file beyond the file name that we’ve been limited to since the beginning of the computing era. However, few users have the opportunity to understand how this meta data is used by SharePoint for searching. This leads to problems when users decide that it’s necessary to use SharePoint Portal Server Search to search on information contained in a field that they have added. In this article you’ll learn how SharePoint uses document library fields to create properties which are searchable and how to enable searching on those properties.

The Power of Properties

SharePoint Portal Server’s search facility really works in two different ways. First, there’s the full text search. This searches across all of the text in every document that is in the index. This search is what most people think of when they think of SharePoint’s search capability.

Second, there is the property search. During the indexing process, the IFILTERs which extract the text out of the documents put property information into special property buckets which are kept separate in the index so they can be searched separately. This allows you to set properties in your Office documents such as department, project number, author, keywords, etc., and then you’ll have the ability to search on those fields individually. You can use the search engine in SharePoint to search for documents where the department is engineering and the project is 123. Where a full text document search for engineering and 123 may find hundreds of entries because the words engineering and the number sequence 123 appears in many documents, a search via properties may yield the 10 or so documents that are truly relevant to your search.

Properties are what most people believe that they are creating when they create a new field in a document library. That’s not actually true. The meta data fields in a document library don’t have anything to do with properties directly.

Office Does a Slight of Hand

However, during the edit process Office perform a little slight of hand. It takes the information you enter in the meta data fields for the document library and makes corresponding custom properties in the document. The net effect is that although you’ve only created fields in a document library, you’re documents now have custom properties.

These custom properties are picked up by the indexing process (more specifically the IFILTER for Office documents) and they are placed into the search index. You can then use those properties by making them available via the advanced search page in SharePoint.

However, this also means that non-Office documents don’t share the same relationship between fields in the document library and the properties of the document itself. So if you’re trying to develop a searching mechanism for documents like TIF documents or PDFs you’ll find that setting up a meta data field for those document libraries won’t allow you to search for those documents directly via their properties. You’ll still be able to organize the information

Setting up a test

Now that we understand the basic mechanisms of how SharePoint uses meta data and properties let’s demonstrate how it works. Here’s what you need to do to set up the demonstration.

  1. Create a site. For instance, I used /Sites/Test.
  2. Open the Shared Documents library by clicking on the Quick Nav Bar link to Shared Documents.
  3. Click on the Modify Settings and Columns link in the left bar.
  4. Click on the Add a new column link
  5. Enter a name in the Column name box. You can select any name you would like, I used Rob, Try, and IntranetJournal as my field names.
  6. Click the OK button.
  7. Repeat steps 4-7 for any additional columns you want to create. I repeated the process two additional times to get my two additional fields in.
  8. Click the Go Back to “Shared Documents” link to return to the Shared Documents Library.

Putting it to work

Now that you have a document library with custom fields, you can create a few new documents. Here’s what to do:

  1. Click the New Document button on the task bar.
  2. Enter some basic text in the document. (This text can be anything you would like.)
  3. Click File-Save.
  4. Give the document a name and click the Save button.
  5. Enter the meta data into the fields which are displayed. You should consider entering different text than the text you used in the document.
  6. Click the OK button.
  7. Close Word.
  8. Verify that the text that you entered is visible in the new fields on the document library.
  9. Click the document you just created in the document library list and click OK to the warning dialog if necessary.
  10. Click File-Properties. Notice that the meta data properties from the document library appears.
  11. Click the File Properties button. The standard Word document properties dialog is displayed.
  12. Click the Custom tab.
  13. Note that the meta data that you entered for the document library also exists as a custom property of the document.
  14. Click OK to close the properties dialog.
  15. Close Word.

Now you have a document in SharePoint with properties so you can go setup the search for them.

Ensure that the Document is Indexed

Before you can search on your new property you have to first ensure that SharePoint has indexed the document. This is done from the Portal by following this procedure:

  1. Open the SharePoint Portal
  2. Click on Site Settings in the navigation bar on the top of the window.
  3. In the Search Settings and Indexed Content section, click the Configure search and indexing link.
  4. In the Content Indexes section, click on the Manage content indexes link.
  5. Hover over the Non_Portal_Content link, drop down the menu via the arrow on the right, and click Start Full Update.
  6. Wait for the Last Update Status for the Non_Portal_Content index to display the word Idle. The page will automatically refresh itself.

Set up the Properties for Search

We had to ensure that the document was indexed so that the new properties would appear. During the indexing process the IFILTER which processes Word files automatically created new entries in the SharePoint Search Property list for the properties which were discovered in the document we just uploaded. The final set of steps are to enable the properties on the advanced properties page. To do this follow these steps:

  1. Open the SharePoint Portal.
  2. Click on Site Settings in the navigation bar on the top of the window.
  3. In the Search Settings and Indexed Content section, click the Manage properties from crawled documents.
  4. Click the plus sign to the left of urn:schemas-microsoft-com:office:office.
  5. Scroll down to one of the names of the fields you added to the list above (and verified became a property). Click the property from the list.
  6. Check the Include this property in Advanced Search options checkbox.
  7. Click the OK button.
  8. Click the Return to Portal link at the top of the page.
  9. From the Start Menu select Run and then type in IISRESET and press return.
  10. Click the magnifying glass to the left of the drop down list containing the text All Sources.
  11. Expand the drop down underneath the Search by properties: label to see that your new property is available to be searched.

You can now search SharePoint Portal Server for just the field you added in the list . Of course, as you have seen, you’re actually searching the property that was added to the Word document, but the effect is the same since Office is managing the transition to the document properties.

forge

SharePoint Search across the Globe

Several of my global clients have approached me over the last few weeks in some stage of planning or implementation of a global search solution. So I wanted to take a few moments and talk through global search configuration options including the general perceptions we have, the research on how users process options, the technology limitations – and the options. The goal here is to be primer for a conversation about how to create a search configuration that works for a global enterprise.

Single Relevance

There’s only one way to get a single relevance across all pieces of content – that is to have the same farm do the indexing for every piece of content. Because the relevance is based on many factors – including how popular various words are, etc., the net effect of this that if you want everything to be in exact right relevance order you’ll have to do all of the indexing from a single farm. (To address the obvious questions from my SharePoint readers, neither FAST nor SharePoint 2013 resolves this problem.)

OK, so if you consider that in order to accomplish the utopian goal of having all search results for the entire globe in a single relevance ordered list, one farm is going to have to index everything, you’ll have to have one massive search index. This means that you’ll have to plan on bringing everything across the wire – and that’s where the problems begin.

Search Indexing

In SharePoint (and this mostly applies to all search engines), the crawler component indexes all of the content by loading it locally (through a protocol handler in the case of SharePoint), breaking it into meaningful text (IFilter) and finally recording that into the search database. This is a very intensive process and by its very nature it requires that all of the bits for a file travel across the network from the source server to the SharePoint server doing the indexing. This is generally speaking not an issue for local servers because most local networks are very idle – there’s not an abundance of traffic on them and therefore any additional traffic caused by indexing isn’t that big of a deal. However, the story is very different in the case of the wide area network.

In a WAN most of the segments are significantly slower than their LAN counterparts. Consider that a typical LAN segment is 1gbps and a typical WAN connection is at most measured in megabytes. Let’s take a big example of a 30 Mbps connection. That means the LAN is roughly 300 times faster. For smaller locations that might be running on 1.544 Mbps connections the multiplier is much larger. (~650). This level of difference is monumental. Also consider that most WAN connections are at 80% utilization during the day.

Consider for a moment that if you want to bring across every bit of information from a 500 GB database across a 1.544 Mbps connection it will take about a month – not counting overhead or inefficiency to pull the data across the wire. The problem with this is what happens when you need to do a full index or when you need to do a content index reset.

Normally, the indexing process is looking for new content and just reading that and indexing it. That generally isn’t that big of a deal. We read and consume much more content than we create. So maybe 1% of the information in the index would change in a given day – in practical terms it is really much less than this. Pulling one percent of the data across the wire isn’t that hard. If you’re doing incremental indexes every hour or so you’ll probably complete the incremental index before the next one kicks off. (Generally speaking in my SharePoint environments incremental indexing takes about 15 minutes every hour.) However, occasionally your search index becomes “corrupt”. I don’t mean that in the “the world is going to end” kind of way, just an entry won’t have the right information. In most cases you won’t know that the data is wrong – it just won’t be returned in search results. The answer to this is to periodically run a full crawl to recrawl all the content.

During the time that the full crawl is running, incremental crawls can’t run. As a result while the indexer is recrawling all of the content some of the recently changing content isn’t being indexed. Users will perceive the index to be out of date – because it will be. If it takes a month to do a complete index of the content then the search index may be as much as a month out of date. Generally speaking that’s not going to be useful to users.

While you will schedule full crawls on a periodic basis – sometimes monthly and sometimes quarterly. However, very rarely you’ll have a search event that will lead to you needing to reset the content index. In these cases the entire index is deleted and then a full crawl begins. This is worse than a regular full crawl because it won’t be just that the index is out of date – but it will be incomplete.

In short the amount of data that has to be pulled across the wire to have a single search is just not practical. It’s a much lower data requirement to just pass along user queries to regionally deployed servers and aggregate those results on one page.

One Global Deployment

Some organizations have addressed this concern with a single global deployment of SharePoint – and certainly this does resolve the issue of a single set of search results but at the expense of everyday performance for the remote regions. I’ve recommended single global deployments for some organizations because of their needs – and regional deployments for other situations. The assumption I’m making in this post is that your environment has regional farms to minimize latency between the users and their data.

Federated Search Web Parts

Out of the box there is a federated search web part. This web part will pass the query for the page to a remote OpenSearch 1.0/1.1 compliant server and display the results of the query. Out of the box it is configured to connect to Microsoft’s Bing search engine. You can connect it to other search engines as well – including other SharePoint farms in different regions of the globe. The good news is that this allows users to issue a single search and get back results from multiple sources; however, there are some technical limitations; some of which may be problematic.

Server Based Requests

While it’s not technically required for the specifications, the implementation that SharePoint includes has the Federated Search Web Parts doing the processing of the remote queries via the server – and not on the client. That means that the server must have access to connect to all of the locations that you want to use for federated search. In practical terms this may not be that difficult but most folks frown on their servers having unfettered access to the Internet. As a result having the servers running the federated searches may mean some firewall and/or proxy server changes.

The good news here is that federated search options must be configured in the search service – so you’ll know exactly what servers need to be allowed from the host SharePoint farm. The bad news is that if you’re making requests to other farms in your environment you’ll need a way to pass user authentication from one server to another and in internal situations that’s handled by Kerberos.

Kerberos

All too many implementations I go into don’t have Kerberos setup as an authentication protocol – or more frequently their clients are authenticating with NTLM rather than Kerberos for a variety of legitimate and illegitimate reasons. Let me start by saying that Kerberos, when correctly implemented, will help the performance of your site so outside of the conversation about delegating authentication it’s a good thing to implement in your environment.

Despite the relative fear that’s in the market about setting up and using Kerberos, it is really as simple as setting SharePoint/IIS to use it (Negotiate), setting the service principle name (SPN) of the URL used to access the service to the service account, and setting the service account up for delegation. In truth that’s it. It’s not magic – however, it is hard to debug. As a result, most people give up on setting it up. Fear of Kerberos and what’s required for it to be setup correctly falls into what I would consider to be an illegitimate reason.

There is a legitimate reason why you wouldn’t be able to use Kerberos. Kerberos is mutual authentication. It requires that the workstation be trusted – which means that it has to be a domain joined PC. If you’ve got a large contingent of staff that don’t have domain joined machines, you’ll find that Kerberos won’t work for you.

Kerberos is required for one server to pass along the identity of a user to another server. This trusted delegation of user resources isn’t supported through NTLM (or NTLMv2). In our search case, SharePoint trims the search results to only those results that a user can see – and thus the remote servers being queried need the identity of the user making the request. This is a problem if the authentication is being done via NTLM because that authentication can’t be passed – and as a result you won’t get any results. So in order to use the out-of-the-box federated search web parts to another SharePoint farm, you must have Kerberos setup and configured correctly.

Roll Your Own

Of course, just because the out-of-the-box web parts use a server-side approach to querying the remote search engine – and therefore need Kerberos to work for security trimming – doesn’t mean that you have to use the out of the box web parts. It’s certainly possible to write your own JavaScript based web part that will issue the query from the client side to the server and therefore have the client transmit their authentication to the remote server. However, as a practical matter this is more challenging than it first appears because of the transformation of results through XSLT. In my experience, clients haven’t opted to build their own federated web parts.

User Experience

From a user experience perspective, the first thing users will realize when using the federated search web parts is that the results are in different “buckets” and they’re unlikely to like this. As we started this post, there’s not much that can be done to resolve this problem from a technical problem – without creating larger issues of how “fresh” the index is. So while admittedly this isn’t our preference from a user experience perspective there aren’t great answers to resolving it.

Before dismissing this entirely, I need to say that there are some folks who have decided to live with the fact that relevance won’t be exactly right and are comingling the results and dealing with the set of issues that arise from that including how to manage paging, what to do about faceted search refinement – that is the property-value selection typically on the left hand side of the page. When you’re pulling from multiple sources you have to aggregate these refiners and manage your paging yourself – this turns out to be a non-trivial exercise, and one that doesn’t appear to improve the situation much.

Hicks Law

One of the most often misused “laws” in user experience design is Hick’s Law. It states, basically, that given a longer list of items vs. two smaller lists that a user will be able to find what they’re looking for out of one list faster. (Sorry this is gross oversimplification; follow the link for more details.) The key is that this oversimplification ignores two key facts. First, the user must understand the ordering of the results. Second, they must understand what they’re looking for – that is they have to know the exact language being used. In the case of search, neither of these two requirements will be met. The ordering is non-obvious and the exact title of the result is rarely known by the user that’s searching.

What this means is that although intuitively we “know” that having all the results in a single list will be better, the research doesn’t support this position. In fact some of the research quoted by Barry Schwartz in The Paradox of Choice seems to indicate that meaningful partitioning can be very valuable about reducing anxiety and improving performance. I’m not advocating that you should break up search results that you can get together – rather I’m saying that we may have a perception that comingled results may be of higher value than they may actually be.

Refiners and Paging

One of the challenges with the federated search user experience is that the facets will be driven off of the primary results so the results from other geographies won’t show in the refiners list. Nor is there paging on the federated search web parts. As a result the federated results web parts should be viewed as “teasers” which are inviting users to take the highly relevant results or to click over to the other geography to refine their searches. The federated search web part includes the concept of “more…” to reach the federated search results’ source. Ideally the look and feel – and global navigation – between the search locations will be similar so as to not be a jarring experience to the users.

Putting it Together

Having a single set of results may not be feasible from a technology standpoint today, however, with careful considerations of how users search and how they view search results you can build easy to consume experiences for the user. Relying on a model where users have regional deployments for their search needs which provides some geographic division between results but also minimizes the total number of places that they need to go for search can help users find what they’re looking for quickly – and easily.

forge

Adding a Google Search Option in Your SharePoint 2010 Search Scopes Without Code

It turns out that integrating internet search providers into the drop down list in SharePoint – the search scopes dialog is relatively easy and doesn’t require any code. Here’s how you can do it.

Create the Scope

If you want to create a global scope you can go into Central Administration then Service Management for the Search Service and finally Scopes on the left, or you can create the scope at the site collection level by selecting the Search Scopes option from Site Settings as shown:

Once there, you need to create a new scope and set it up as follows:

Note that you’ll want to check the Search Dropdown checkbox to get it to show up in the search scopes drop down and that the page being referred to in the target results page must be created. Also, you’ll need to add a rule to the scope. Once you hit OK you’ll be back at the list of the scopes and there will be a link to add rules. You should add a rule for all content (because it’s simple) – The add rules page should look like this:

Once you hit OK you’ve got a fully functioning scope that’s ready for use – just as soon as the system gets around to it. While we’re waiting, let’s go setup the page.

Creating the Redirect Page

In my case, I am using a simple search center for this demo (SRCHLITE) so I went in and copied the default.aspx page and removed the existing web parts. Then I added a content editor web part with some JavaScript to do a redirect to the Google search page. So here’s the bit that’s important. SharePoint automatically appends the search terms with a k= to the query string (k is for keyword). Google needs the query to be a q=, so we’ll have our JavaScript change the k= for a q=. The script looks like this:

<script language=”javascript”>
var queryString = window.location.search;
if (queryString.indexOf(‘k=’) != -1) {
var fullUrl = ‘http://www.google.com/search’ + queryString.replace(‘k=’, ‘q=’);
window.location.replace(fullUrl);
}
</script>

You’ll notice that in the script I check to see if there’s a k= in the querystring and only redirect if there is one – this is so we can manage the page without being redirected. I wrapped this script up into a Content Editor Web Part (which you can get here). I added this web part to my page and it was ready to redirect me to google.

Processing Scopes

If I was patient (and I’m not) I could have just waited for everything to be ready, however, I can also go back into Search management (through central administration) and if there’s a Scopes needing update shown there is also a Start update now link you can click to force search to compile the scopes for use as shown:

Once the search scope is compiled you can enable the scopes drop down list and use the link to Google.

Turning on Scopes for Most Pages

For the standard search box you can enable the scopes drop down list by going to Site Settings – Search Settings and changing the search dropdown mode to show scopes:

Once you hit OK, everywhere that there is a search box in the site collection should now have a scopes drop down list – that is except a search page, which has its own settings.

Turning on Scopes for a Search Page

For a search page you can turn on the scopes dropdown box by placing the page in edit mode then select Edit Web Part from the web part control menu (upper right). Expand the Scopes Dropdown section and select ‘ Show scopes dropwdown’. This will cause the scopes to show.

Testing

To test the setup just go into search, select the Google search scope and enter a search term.

forge

Article: Secure Search Can Protect Your Sensitive Information

Two guys walk into an office. The first one asks, “Do you know what’s different between searching the Internet and searching your intranet?” The second one exclaims “Just about everything!” Sometimes it can seem like everything that you know about searching on the Internet just doesn’t apply to your intranet. You expect when you search the Internet you’ll find something. (It may not be the right thing but that’s not relevant right now.)

eDiscovery activities can leverage search to dig up more dirt than a Caterpillar convention. Do you want to know the best data discovery tool that auditors have right now? It’s your search engine. You’ve already indexed the content. All they need is an account and they can find any piece of information that you didn’t want found by just anyone in the organization. Maybe they’re searching for credit card numbers for a PCI audit, or social security numbers for a PII audit, whatever it is the search tools are going to find it. And yet, when you want to find something you’re left out in the cold.

Read More…

forge

Article: Training Search to be Your Adult Learning Hero

“Eyes forward. If you can’t pay attention, I’ll rap your knuckles with my ruler.” This may be an echo of a strict Catholic education or it may be a hyperbole of how your child is being trained at school, but either way, it doesn’t have a place in how you educate the adult learners in your organization.

Malcolm Knolwes in his book, The Adult Learner: The Definitive Classic in Adult Education and Human Resource Development , discusses andragogy – or learning for adults – and why it’s different than pedagogy – learning for children.  The conclusion is that there are six key assumptions about adult learning:

  • Need to Know
  • Foundation
  • Self-Concept
  • Readiness
  • Orientation
  • Motivation

Trying to put these together into a single context; it’s clear that adult learners need to be trained at the moment in time that they need the learning (readiness), why they need to know a piece of information (need to know), that they have the foundational concepts necessary to integrate the new information (foundation) and that they have an understanding of the problem they are trying to solve (self-concept).  The training must be focused on solving problems (orientation) and the motivation for learning must map to the internal motivations of the student (motivation).

Read more …

forge

Don’t mess with SharePoint’s Site Property

It’s been a while since I’ve stumbled across a defect in SharePoint so I was about due. (I’m not saying that it’s a buggy product — I’m saying that you do what I do you just get used to finding defects.) This one was interesting. SharePoint has what are called dynamic scopes (‘This Site’ and ‘This List’) and for this particular client they didn’t work. They never returned results. You could search with a different scopes and you would find the content so I knew it was being crawled correctly. However, when I would search with one of the dynamic scopes, no dice. What made it more fun is that it was only one of the two SSPs on the farm that had the problem — so we knew it wasn’t corrupt binaries.

After a ton of research and some good work by escalation engineering we realized that the ‘Site’ managed property in the SSP had been deleted and recreated. Oops. That’s bad. Somewhere SharePoint relies on that not happening (generally because they refer to the property via a well known ID number instead of its name.) The issue has been reported but it’s not going to get fixed. The net result is that we get to recreate the SSP. That wouldn’t be so bad except that this customer has targeted content. So? Well, audiences may have friendly names but what’s stored in the pages and web parts for targeting is the GUID of the audience — and when you create an audience, you can’t specify the GUID — even through the API. Gary Lapointe has some STSADM command extensions that will allow you to import and export audiences. That’s a good start but that still means coming up with some code to enumerate all of the pages and all of the web parts on the pages looking for targeting — and changing GUIDs if they are. The good news is that the field isn’t really protected so you can do the replacement pretty easily. The bad news is that no one has created a tool to walk all of the targeting in a web (or farm) and make those GUID changes. (It would be quite useful.) We’ll probably manually repair the targeting because it will take us longer to build and test a tool than to do it by hand…

forge

Search Center vs. Search Center Lite

The topic of the week appears to be the difference between the two Search Centers in SharePoint. One search center, Search Center Lite — which shows up in the user interface as Search Center, is created by default for you if create a Collaboration Portal. (It’s on /search.) The other search center, Search Center with Tabs, only shows up if you activate the Office SharePoint Server Standard Site Collection features (See below)

Once you’ve activated the feature your create site page will include the full list of templates including: Document Center, Records Center, Personalization Site, Site Directory, Report Center, Search Center with Tabs, and Search Center (See below)

If your create site only has Records Center, Report Center, and Search center — you don’t have the feature activated (see below)

So what’s the big deal? The differences aren’t that big, are they? The standard search center (Search Center Lite) shows:

Where the Search Center with Tabs shows:

Basically tabs. Who cares? Well if you have a set of complex customizations and want people to be able to search in different ways — then you care. Search Center with Tabs uses the publishing features (WCM) in SharePoint to allow you to create your own pages with different search configurations on them.

forge

Web Cast: SharePoint for Internet Site Development: Search

On December 18th, 2008 I’ll be doing an hour long web cast on how to leverage SharePoint Search for Internet facing sites. The web cast will be available after the event as a recording. To register, or watch the recorded event go to: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032396824&Culture=en-US

forge

TechEd EMEA 2008: Connecting Office Client, SharePoint, Search, and Workflow

Earlier today I presented on Connecting Office Client, SharePoint, Search, and Workflow. It’s one of my favorite talks because it shows an end-to-end solution from getting data from the end users through promoting the properties in SharePoint, making them searchable, and taking action via workflows. In the presentation I referred to a few tools and resources. They are:

Good luck on your next project to make life easier to get data in — and used in your organization.