In the world of KYC, Google has become a ubiquitous component of client onboarding, CDD, EDD and Vendor Due Diligence best practices. Searches are often written once and then enshrined in procedures. As a supplement to checks against traditional sanctions, public record, PEP, criminal record, litigation and specialized adverse media providers, a Google search can reveal valuable, risk relevant insights. However, a simple Google search on the subject name will return a great number of results, most of which are neither adverse nor relevant. Many financial institutions attempt to reduce the results to a more manageable quantity by using Boolean logic. That’s where it gets a little tricky.
Google’s search algorithm evolves
Google is often admired for its agility, and its ability to create, deploy, assess market demand for and decide the fate of new products in a matter of days. The search algorithms are similarly agile, constantly changing to improve the search experience. For this reason, the search string entered today may yield different results tomorrow. Why is this?
Google’s algorithms learn something about your preferences every time you search. Future search results are then ordered based on your implied areas of interest. So your employees’ lunchtime surfing activities could lead to different search results from different users.
Publishers have access to a powerful range of SEO tools and are constantly working to make sure their results are as high on the search results list as possible. Google, from time to time, will improve the algorithms. A recent example is that when you search from a mobile device, priority is given to content that is highly compatible with the mobile devices, potentially demoting relevant information.
You say tomato, I say tomato
Google operates country specific domains, each of which presents search results with a geographical preference. So if like the former British Empire, the sun never sets on your KYC process, you could see significantly different news results depending on which country the search is conducted in. Try searching for “Parliament” at www.google.com, www.google.co.il and www.google.co.uk.
Do you speak Google?
Seasoned investigators, IT professionals and programmers share a common language of syntax, operators, wildcards and other tools that help target and refine searches. The challenge is that Google’s syntax is dynamic and unique. For example:
- Google discontinued use of the logical operator + some time ago so
- Yankees + Florida is the same as Yankees Florida
- The AND goes without saying. Google uses the implied AND so
- Yankees AND Florida is the same as Yankees Florida/li>
- Google ignores parenthesis so
- Villain AND (convicted OR arraigned OR accused) is the same as
- Villain convicted OR arraigned OR accused
- How to be approximate? Google doesn’t use ~ (tilde). By default, it will include terms close to the search word, unless you specify exact searching by putting the search word in quotations.
Of course in the above examples, as the search results are the same, it doesn’t really matter if you have redundant AND, parenthesis, ~ or + operators. However, there are other scenarios in which knowing how to speak Google will improve your
- WILDCARDS BEHAVE DIFFERENTLY. Do you use ! or * with a truncated word to search all related combinations? For example, you may think that by searching for Crim! Or Crim* you are looking for criminals, crimes or criminology, but in fact you are just looking for crim as Google ignores the wildcard. Try this one and you will find you have to dig through many pages of crim before you find any criminals. Another example we often see is illeg*. Try this search and you will learn about the Hebrew word for a stammerer, a lovely baalbec door at the J. Paul Getty museum and the illeg family tree before you see anything illegal.Google does use the * wildcard but only for complete words, so Wild * rafting would return results about wild water rafting or wild river rafting but Wild* rafting is the same as searching wild rafting
- DO YOU USE TOO MANY WORDS? Google searches are limited to 32 keywords, anything beyond this is ignored (the 32 does not include operators, punctuation such as OR, Site:, -, @ or #)
“one OR two OR three OR four OR five OR six OR seven OR eight OR nine OR ten OR eleven OR twelve OR thirteen OR fourteen OR fifteen OR sixteen OR seventeen OR eighteen OR nineteen OR twenty OR twentyone OR twentytwo OR twentythree OR twentyfour OR twentyfive OR twentysix OR twentyseven OR twentyeight OR twentynine OR thirty OR thirtyone OR thirtytwo OR thirtythree”
and you will receive the following error message:
“thirtythree” (and any subsequent words) was ignored because we limit queries to 32 words.
- YOU ARE NOT SUBTRACTING? Google uses the “-“ function instead of the
more traditional NOT to exclude findings containing the word immediately following. This can be very powerful, saving time and reducing the noise by eliminating immaterial or undesirable results.
Call to Action
- Learn more. There are many resources available to understand how to get the best search results. A good place to start is Googles own support resource https://support.google.com/websearch/answer/2466433?hl=en
- Review your search criteria and update to get the results you want
- For a consistent and repeatable process, execute your Google searches, along with the rest of your data collection, directly from within your KYC workflow software
- Repeat often