Even with today’s advanced review tools, search terms remain a critical component in many matters. At ProSearch, our primary search system is dtSearch.
A few key concepts in eDiscovery search to keep in mind:
- Search and retrieval vary across systems—understanding exactly how information is indexed and how search functions operate in a given system is essential.
- Search in discovery has a different intent than internet or knowledge base searching and therefore requires different skills.
- Keywords, without refinement and testing, are both under-and over-inclusive.
- Search term development should be a qualitative process. Quality is measured through an iterative process that includes testing and sampling.
- Search terms, when used appropriately with a sound methodology, can yield good results.
- Defensibility requires a sound methodology and a documented process.
Following are the ProSearch team’s favorite tips and tricks to improve searching in dtSearch:
1. Searching with &
Challenge:
The dtSearch system reserves the ampersand as an operator character, used for synonym searching, where words with meanings related to the entered search term will also be returned. Relativity’s dtSearch does not have access to a thesaurus, so synonym searching does not function. When searching for a term that actually includes the ampersand, such as in a law firm name (Butler & Jones, for example), the term returns zero results. Putting the term in quotes is not a solution.
Solution:
Remove the ampersand and run the two words as a single-phrase term, separated only by a space (for example, Butler Jones)
Note that this method will return documents containing several variations, such as Butler & Jones, Butler Jones, butler.jones, etc. To search only for the exact phrase of Butler & Jones, a regular expression must be used.
2. Searching with #
Challenge:
The number, hash, or pound sign is also reserved by dtSearch as an operator character, the phonic searching operator, where terms that “sound like” the input term will be returned: for example, a search for #dog would return dog, but it would also return dock and duck. Most uses of # in search terms are not intended as the phonic operator, however, but are instead meant as a substitute for the word “number,” as in the common terms pin# or pin #. Unfortunately, these terms both return zero documents in a default index regardless of prevalence.
Solution:
Remove the operator character and run the simplified term, such as pin, to return the correct set of documents. To search only for the exact phrase of pin#, a regular expression must be used.
3. Order of Operations
Challenge:
In dtSearch, the default order for the processing of operator words is: (1) OR, (2) Proximity operators, (3) AND. If there are multiple instances of the same operator word in a term, the words at the same level are parsed from left to right, as in English reading order.
Solution:
Include parentheses to change the order in which components of a term are parsed and ensure the term is interpreted in the intended manner.
Examples:
- The term apple AND banana OR grape w/5 mango without parentheses is interpreted by dtSearch as (apple AND ((banana OR grape) w/5 mango)). In this instance,
(banana OR grape) is processed first, then ((banana OR grape) w/5 mango), and finally the whole term.
This may be the intended parsing of this term, but that is not always the case.
- For the term stonewall OR stone w/2 wall without parentheses, dtSearch will interpret this as (stonewall OR stone) w/2 wall, which is not the likely intended outcome of looking for stonewall as either one word or two.
Always include parentheses in complex terms to ensure clarity and that future iterations of the term are likely to include the correct parsing order, even if the term is edited.
4. Regular Expressions
Challenge:
Along with searches for patterns, searches for terms containing certain special characters – those treated by dtSearch as operator characters – are among the most common uses for regular expressions. Because they still maintain their operator functions even when added to the index, operator characters typically cannot be searched for in standard search terms without returning many false hits or an unreliable result set.
Solution:
Convert terms containing special characters to regular expressions using the Unicode code point for the character in question.
Examples:
- In a single-word term where the special character is attached to the word, such as pin#, follow these steps:
- Add the # to the index by removing the # from the Spaces section and adding it to the Letters section of the index’s alphabet file.
- Google “Unicode number sign” to find the Unicode number designation for the #, which is 0023, and use that to rewrite the term: “##pin\u0023”.
Note: The double hash sign at the beginning indicates to dtSearch that this term is a regular expression, the quotes (which must be straight quotes, not curly) set the term off from other potential component terms in the same larger term, and the \u indicates that what follows is a Unicode character.
- For a term like Butler & Jones, rewrite only the portion of the term that contains the ampersand. (Regular expressions in dtSearch do not recognize spaces.)
- Add the ampersand to the index by removing it from the Spaces section and adding it to the Letters.
- The Unicode for the ampersand is 0026, so rewrite the term as
Baker “##\u0026” McKenzie – only the actual portion of the term that contains the ampersand is rewritten, so as not to include spaces within the regular expression.
To increase search effectiveness, our teams use Syntax Inspector™, a tool designed to support the Relativity-specific implementation of dtSearch. It validates search syntax and identifies technical errors that might compromise results. It also identifies terms with multiple operators or multiple sets of grouping parentheses to allow confirmation that the logical form of a query matches the intended target of search.
The dtSearch system is a complicated but powerful search engine. Keeping these tips and tricks in mind can help make using dtSearch less challenging and produce better, more effective search results.
Caitlin Wilhelm
Linguistic Data Analyst at ProSearch Strategies, Inc