If you havenâ€™t already, read KRA, Safaricom & You. Go on. Iâ€™ll Wait.
That post has raised quite a bit of interest, and so, a follow up.
First of all, there is nothing wrong with data mining. If you are a serious company you hire a guy like me to crunch your data and give you new, non-obvious insights. You will get insights like
- How to target your products
- Which to discontinue
- Which to invest in
- Whether an advertising campaign is working
- Usage patterns
- Purchase patterns
- Customer patterns
What I OBJECT STRONGLY to is government mining our transactional information because we might be evading taxes. And so should you!
Now, let us get back to MPesa. I would like to discuss data mining in a bit more detail.
Again, Iâ€™m using MPesa because the numbers from the other providers are of nuisance value.
An MPesa transaction has the following information
- Sender phone number
- Recipient phone number
- MPesa Outlet
To register for MPesa, or indeed to get your line you provided a raft of information about yourself. The interesting bits are
- ID Number
- Date of birth
Let us look at that MPesa outlet. An MPesa outlet, obviously, must register itself. Therefore the following information is available
- Outlet name
- PIN Number
- Physical Location
- Owner name, ID number
- GPS Co-ordinates *
- Opening time *
- Closing time *
The starred items are what I am not sure Safaricom collects, but if i were them, I would.
Now, 6 months of this data is data mining gold. Iâ€™d frankly be astonished if Safaricom did not mine this database.
There are some quick, obvious things that you can derive to improve service delivery.
Which MPesa outlets open on time
Given an outlet that claims to open at 8, if the earliest MPesa transaction on a daily basis is between 9 and 9-30 over a continuous period, it is likely that outlet does not open on time
Which MPesa outlets close on time
Same as above. Only for closing time
Which MPesa outlets should be closed
Given you have the GPS co-ordinates, you can position the MPesa outlets on a map. If you find there are four next to each other, A,B,C and D, and A,B and C do on average 30 transactions a day but D just does 5, you can probably close D
Where do you need more MPesa outlets
Example as above. If you find A,B,C and D are processing transactions continuously from opening to closing time i.e. there are no hourly spikes, cross references with the average number of processed transactions across outlets, it is likely they are working flat out in which case you might need more outlets to absorb the load
Which MPesa outlets have a demographic profile
This is more interesting. Since you have the senderâ€™s details you can derive things like what is the modal age of customers at a particular MPesa outlet. By modal age I mean get the age of the sender, and find out how frequently that age occurs.
In other words, you can find in a particular outlet, most visitors are between 25-30 and in another most visitors are 18-23 and in another 40-50.
This is useful information for any competent marketing person. Or a practical person e.g. in the place where most visitors are 40-50 Safaricom can advise the outlets to get chairs for customers to sit on as they wait.
Which are the peak times for transactions
Self explanatory. You might find for example on average an outlet does 10 transactions an hour but at lunch time it spikes to 200. Then it drops back to 10.
You find this outlet cannot handle the spike so customers have to queue.
Dilemma. If you open a second outlet, it will likely be idle. If you do nothing â€“ customer dissatisfaction.
Solution: something like a portable MPesa outlet (a van or something) that can go there at lunch time, absorb the load and then leave)
What is the average time it takes to complete a transactions
Self explanatory. If you remember the initial forms to fill they collected a lot more detail than they do now. Someone must have analyzed these numbers and optimized the process.
And so on. There are tons of other things that you can look out for but those examples should suffice.
Letâ€™s move on to the transaction themselves.
Remember this information is at your disposal
- Sender name
- Sender ID number
- Sender gender
- Sender age
- Recipient name
- Recipient ID number
- Recipient gender
- Recipient age
- MPesa outlet name
- MPesa outlet location
Armed with a bit of mathematics, economics, psychology mining this information will yield a GOLD MINE of information. Let me re-iterate â€“ anyone with access to both this data and data mining expertise OWNS YOU.
If that alone is a gold mine, Safaricom is sitting on a gold mine next to oil and platinum deposits for the excellent reason that they also have access to your call records.
In other words, they can cross-reference your call and your MPesa records and mine that bad boy still further. Add to this the SMS database and this is paradise.
You can derive a treasure trove of information from this, over and above the examples I gave in my previous post
Over and above who are you sending the money to, there is a lot of context to be gleaned if we can guesstimate why you are sending the money.
Let us take an example of how end to end mining would work.
Let me again repeatâ€“ data mining is premised on PROBABILITY, not certainty. Some of the assumptions may be wrong. But usually, you can derive pretty good confidence levels
0721 000000 sends 5,000 to 0722 000000 at 2.00 AM, via his phone.
Letâ€™s get started.
First of all, let us build a profile of both sender and recipient.
0721 00000 maps to John Kamau, aged 37. He has been a customer since 2000.
0722 000000 maps to Jamie Omondi, who is not a male as first thought, but a female, aged 32, a customer since 2003.
Next, let us analyze the context.
A 2.00 AM transaction is unusual. This is unlikely to be paying for something. Let us hop over to the phone logs database. Aha. John and Jamie have in the past made calls to each other.
We can therefore infer that they know each other. Therefore that transfer was probably either some emergency or Jamie had a pressing bill that she needed to pay.
The next bit is to check if there are any subsequent transactions where Jamie is the sender.
Oho! Lipa Na MPesa till number 000000 received a payment of 4,500 from Jamie 5 minutes after she got the money from John.
Have there been any other payments from Jamie to that till number? Yes. On average, twice a moth, over a 6 month period.
From the till number we can determine the business it was registered to. Turns out it is Sky Lounge, a swanky bar.
Have there been any other payments to tills belonging to bars? Yes! 6 other bars / hotels over the same 6 month period.
We can then infer that Jamie probably drinks. Given the profiles of the outlets she drinks at, she probably doesnâ€™t drink Senator, but more likely spirits and cocktails.
So, if Safaricom were decide to license targeted customer profile databases and KBL requests and pays for that, guess whose details would be on that database?
Or if Safaricom decides to do context sensitive advertising. Once Jamie logs in to her Gmail via her Bamba modem, Safaricom can tie her traffic to her number. And can therefore serve appropriate ads (Smirnoff, etc)
Relax, I said IF!
Going back to John.
What other transactions has John made?
John has made at least one transaction every month via Pay Bill to a hair salon. The average amount is 5,000 which means it is unlikely he is paying for himself. There is probably a lady in his life, who he accompanies to the salon.
It also urns out he has used Lipa Karo to 3 different schools. Ergo he either is a father with 3 children or he has 3 dependants he pays school fees for.
John also pays DSTV via MPesa. Premium package (7,000) without fail on the 3rd of each month.
John also pays Access Kenya (10,400) for his home internet connection, also on the 3rd of each month.
John also pays Kenya Power an average of 4,000 a month in power, which says something about where he lives â€“ he likely does not live alone.
His bills say a lot about his financial abilities.
In fact, none of his bills is paid earlier than the 3rd.
Looking closer, on the morning of the 3rd of every month John makes a 30,000 deposit into his MPesa from his bank which he uses to pay his bills. This suggests that likely he has a regular income that clears on the 3rd.
John also makes many payments to Steers. As frequently as 3 times a week, averaging 700. The payments are always in the evening.
This suggests that John eats a lot of take-away. Thus it is unlikely he is living with children (no one feeds kids burgers 3 days a week). This is supported by the fact that his spending at the Steers (700 is pretty much a meal for one).
There is also a payment of 3,000 at the end of every month to a number that does not appear in any of his call logs.
This same number also received the same 3,000 from 4 other different numbers, with the same pattern. No calls.
Who do you send money to but never call? Either some nefarious criminal enterprise or much more likely, a some sort of housekeeper.
But let me not belabour the point. A lot of insight can be derived for data mining, and this is not necessarily a bad thing.
Safaricom probably uses this number crunching to derive things like
- New products e.g. tariffs
- Promotions e.g. free calls from minute x
- Pricing & price adjustments
- Optimization of infrastructure
- Competition containment (what is the highest we can charge for inter-network connectivity while still making money, staying clear of the regulator and blacking the eye of other networks)
What horrifies me would be government having direct access to that information. That cannot be a good thing!
Here are some tweets iâ€™ve exchanged with the Director Of Corporate Affairs this morning
BTW any lawyers care to chip in on the previous post?