A variety of companies offer some form of a comprehensive risk engine for predicting which transactions and users are riskier. Here we’ll discuss some of the methods used for these engines—and the strengths and limitations associated with each.
Boolean Rules-Based Systems
“Boolean” means that the result will be either “true” or “false.” So a Boolean rule will either trip, or not, and change the outcome of a transaction as a result. One way to set up these rules is to have some rules cause an order to be accepted, some cause an order to be rejected, and some cause the order to fall into a manual review queue. If no rules are tripped, the order is accepted by default.
You might have a simple rule set like this:
The system will go through the rules, one by one, and if it determines any rule is tripped it will take the appropriate action and skip all the other rules. And the order is important. In the example above, if an order comes through on an account that’s 100 days old, but that account shares a device signature with four other accounts, the order would automatically be accepted because the "greater than 90 days" rule comes first.
The strength of a Boolean rule system is that it’s easy to understand which orders will be accepted, rejected, and reviewed—so long as the rule system itself is uncomplicated. Where you run into issues is when you want to get more specific. Say, for instance, that you have the above rule system but find that high-velocity IP addresses are tied to fraud separately from device signatures. You insert a new rule:
Now, though, you find that your team is reviewing a large number of orders from university IPs, where fraud is relatively rare. So you insert another rule:
This is still pretty straightforward, but it becomes trickier to sort through the overlap in your head. What about a new account with a university IP address that has a device in common with four other accounts and an IP address in common with five? Now imagine there are a hundred rules instead of five.
Time has to be spent carefully curating and arranging these rules, and it can be difficult to determine the overall benefit each individual rule is contributing. If we have a rule, as above, causing orders with more than 3 accounts sharing an IP to fall into manual review, how do we determine the effect of removing that rule? We need to determine how many fraudulent orders and non-fraudulent orders were for under $200 and had greater than 3 accounts on the same IP, keeping in mind that orders from accounts over 90 days old or coming from .edu IP addresses would have been accepted and orders where more than 3 accounts shared the same device would have been rejected. It’s possible to do, but difficult and time-consuming, and it gets exponentially more difficult the more rules you have to take into consideration.
The more rules you add to increase accuracy, the more difficult it becomes to take them all into consideration.
Weighted Rules-Based Systems
One way to mitigate the problem of binary rules is to allow the rules to operate in shades of gray. Each rule has a point value, positive or negative, which we add together. We then set thresholds saying we’ll accept any orders below, say, 500 points, and reject any at or above 1000 points. Any between 500 and 999 will fall into a manual review queue.
Here are our modified rules:
This rule system makes it easier to determine how effective a particular rule is, as long as you can track which orders had which rules activated. With that information, you can potentially do a what-if analysis, determining how many of both fraud and non-fraud orders would have been accepted, rejected, and manually reviewed if, say, the .edu rule were changed from -250 points to -500 or to 0.
As a step up from there, you could determine the impact on fraud, sales, and review time if you made several changes at once. What if you changed the .edu rule to -500 points and changed the order value rule to 750 points? The problem that you eventually run into, however, is that with many dozens of rules in place, thinking of all the possible combinations of point value changes is impossible to do manually. Imagine, for instance, that we want to calculate our reject, review, and accept rates using our 5-rule system, and calculate the changes to those rates if we adjusted each rule up or down by 25 points, or left it alone.
There are 3 possible changes (+25, 0, -25) for each of five possible rules, so we need to do the math on 53 possible combinations, or 125. Doing that manually in a spreadsheet, one combination at a time, would be very time-consuming. Doing it programmatically is faster, but also requires knowing a fair bit about how to automate spreadsheet calculations or how to perform calculations outside of a spreadsheet automatically.
To determine the impact of 5 rules with 3 different types of point changes, you would have to go through 125 combinations.
Now let’s say instead of 3 possible changes, we want to look at 9: +200, +100, +50, +25, 0, -25, -50, -100, and -200 points. And instead of five rules, we have a hundred. We now need to calculate the accept, review, and reject rates on our transactions 1009 times, which comes out to a billion billion calculations. You definitely can’t do that yourself one at a time, or even with a computer.
Fortunately, solving complex math problems like this has been worked on for longer than ecommerce websites have been worried about fraud, and there are some useful techniques available.
Machine learning is, essentially, an attempt to use statistical correlations between a large number of data points to predict missing values for future sets of points. There are a lot of ways to do this, but for fraud prevention what’s used most often is supervised classification. Supervised means that the system is told in advance which already-reviewed orders are fraud, and then asked to predict whether another order is fraud based on what it’s seen so far. Classification means that rather than predicting a value on a scale (say, the price of a stock tomorrow), the model is being asked to predict which bucket to place an item in (say, an order into buckets labeled “fraud” and “not fraud”).
The math involved in making these predictions is fairly involved, but a merchant using a solution like this will generally end up with either a recommendation (“accept” or “reject” or possibly “review”) or a score (say, from 0 to 100) that they can take action on.
This is an example of a black box solution, where you feed data in and get a result out without directly controlling how that result is generated. In the case of a machine learning solution, the methodology behind the result may or may not be visible for the merchant.
Machine learning solutions have the potential to be extremely accurate compared to rules-based systems, while requiring a fraction of the maintenance time. On the other hand, how those generally-accurate results came about is more difficult to ascertain, since the math creating the prediction is “under the hood,” so to speak.
Other Black Box Systems
A solution provider might provide a black box system (data in, recommendation or score out) that relies on methods other than machine learning. For example, the Boolean five-rule system above could easily be offered as a (probably ineffective) black box system. Some solution providers employ proprietary methods for predicting which users or transactions are likely to be fraudulent, and hide their rules for that reason.