Author: admin

  • Hugh Miller named 2021 Actuary of the Year

    We’re delighted to announce that Taylor Fry Principal Hugh Miller has been awarded 2021 Actuary of the Year, by the Actuaries Institute.

    From his earliest days as a graduating student, taking home the University of Sydney’s University Medal in 2005, Hugh was already hinting at a distinguished career to come. He has been a tireless and exemplary role model for the actuarial profession – a powerful voice for excellence, thought leadership and the potential for actuaries to make a tangible difference in society.

    Focusing primarily on the social sector, Hugh uses data in actuarial and advanced analytics projects to improve government policy across employment, welfare, disability and homelessness. This work has helped break ground in the way governments approach investment in society’s most vulnerable communities.

    Hugh uses data in actuarial and advanced analytics projects to improve government policy

    He is a prolific commentator and author in mainstream media and at speaking events, bringing a public spotlight to the profession, especially in tackling some of the most complex issues facing society. In 2020, he led the development, and was media spokesperson for the Australian Actuaries Intergenerational Equity Index (AAIEI), and the Green Paper Mind the Gap – The AAIEI, which he co-authored with Taylor Fry colleagues Ramona Meyricke and Laura Dixie.

    Most recently, in 2021, he again demonstrated his commitment to rigorous research and development in an update to the AAIEI, launching a second Green Paper A narrow escape? The 2021 AAIEI. This Green Paper series has attracted significant media attention and accolades to the profession, with Hugh featured in publications spanning the Sydney Morning Herald, The Australian Financial Review and The New Daily.

    Beyond these recent milestones, Hugh has enjoyed a long and productive association with the Actuaries Institute. In addition to winning Institute awards including the Public Policy Essay Competition (2017) and the AM Parker Prize for most outstanding paper (2011), he was a member of the Data Analytics Practice Committee for three-and-a-half years, and currently sits on the Institute’s Public Policy Committee. In his role as Data Analytics Editor on the Actuaries Digital Editorial Committee, Hugh also pens the regular column Normal Deviance on topics of interest and urgency affecting the industry.

    The combination of Hugh’s impressive body of work, media savvy and numerous awards are testament to his extraordinary skill in bringing actuaries and their highly specialised work to a broad audience, and articulating their growing importance in solving real-world problems.

    Congratulations Hugh on this well-deserved achievement.

    This is an edited version of an original article published by Actuaries Digital.

  • RADAR 2021

    Welcome to RADAR, Taylor Fry’s inside look at the general insurance industry, the state of the market and what it means for insurers.

    In another head-spinning but generally improved year, the issues are big and nuanced – with COVID-19, new customer regulations and climate change the biggest of all. The impacts are mixed and wide ranging, from ongoing positive effects in motor, to the risk of increasing mental health claims in workers compensation and the significant insurer obligations towards improving customer outcomes across all lines.

    Add to this the shockwaves of the IPCC climate report, cybersecurity concerns for directors and officers, and the urgency of affordability – as householder premiums continue to rise yet the class remains unprofitable – and it’s clear the challenges are complex. Rising to them will require depth, insight and agile thinking. In our class-by-class analysis, we carefully unpick the issues to make sense of the way forward …

    Overall profitability

    General insurers rebounded from a disappointing FY2020, with an overall net profit after tax up by 17.6% in FY2021, with premium increases particularly evident in householders, domestic motor and professional indemnity. Reductions in gross claims were achieved despite significant increases in provisioning for COVID-19-related business interruption (BI) claims and the high cost of natural catastrophe claims in FY2021.

    Impacts of COVID-19

    Insurers continue to be significantly but variously impacted by COVID-19, with BI provisions an area of focus. Insurers eagerly await a decision on the second BI test case in September 2021, after increased BI provisions saw a spike in commercial property claims in Q420, following the first COVID-19 BI test case.

    Ongoing border closures and restrictions on movement continue to have positive effects in motor, experiencing fewer collision claims, but negative effects for travel insurance. Looking forward, COVID-19 impacts may reduce as vaccination take-up rates improve and lockdowns are eased.

    Affordability impacted by catastrophe events

    Several large natural catastrophe events impacted property insurers during FY2021, most notably the  Halloween hailstorm in central Qld ($940m industry loss), floods in the Hunter Valley and mid north coast ($650m industry loss), and tropical Cyclone Seroja ($273m industry loss).

    In an effort to address affordability, a reinsurance pool will commence in July 2022, covering cyclone and flood-related damage in Northern Australia, backed by a $10 billion government guarantee.

    Climate change

    The potential for climate change to increase the cost of natural disasters remains a key concern. The latest assessment report from the Intergovernmental Panel on Climate Change (IPCC), released in August 2021, found that climate change and its impacts on extreme weather events were accelerating.

    In recognition of escalating climate change risks, the government allocated $600m in funding for a National Recovery and Resilience Agency to develop initiatives that reduce the risk and lessen the impact of catastrophic weather events on communities.

    Mental health

    Workers are at increased risk of psychological claims, with the additional pressure created by changes in work demands, restrictions on movement and working from home. Primary mental injuries have increased in some publicly underwritten states, with potential to be a future cost pressure also for privately underwritten states.

    Consumer regulations

    FY21 was an important year for insurers implementing new consumer regulations, especially in addressing conduct and disclosure obligations. D&O in particular is a changing landscape, as company officers come to grips with complex, fast-changing risks, such as cybersecurity, as well as the pandemic and climate change. On the flipside, while class actions have adversely impacted D&O claims over several years, recent continuous disclosure and litigation funding reforms may help stem the growth in class actions in future.

    Download RADAR for more expert insights on the shifts and trends in the industry to help you navigate the uncertainty and discover opportunity in our evolving insurance landscape.

  • Data privacy – could traditional approaches work for machine learning models?

    In the second of our two-part series on privacy in the age of big data, we explore the potential to apply data privacy approaches used in traditional settings to machine learning models. We also look closely at emerging approaches such as ‘machine unlearning’, and why the search for commercially robust solutions to help organisations comply with privacy requirements is no simple task.

    Our lives are increasingly tied to artificial intelligence and machine learning, yet the consequences seem unclear – especially for our privacy – as countries scramble to navigate and keep pace with a world dominated by algorithms.

    What does this mean for our privacy in Australia? In Part 1 of our series, we looked at changes to the Privacy Act 1988 under consideration by the Australian Government, and how these might impact machine learning models, the industries who use them and the consumers they target. While the review is still underway, we identified some practical steps organisations can take now to assess and reduce potential privacy implications of the proposed changes for their machine learning models and pipelines. This leads us to ask, could current approaches to protect people’s data in traditional data contexts – for instance, when sensitive information needs to be masked in government or company data – also be applied to machine learning models as we adapt to this changing privacy landscape?

    Finding ways to keep private data hidden that work for business is increasingly challenging

    In considering the question, we explore:

    1. The challenges in applying data privacy approaches in traditional data contexts
    2. A traditional data privacy approach and its applicability for machine learning – Differential privacy is a system that allows data about groups to be shared without divulging information about individuals. We look at whether its principles can be applied to machine learning models to prevent privacy attacks that allow probabilistic private information about individuals to be inferred, or deduced, with an accuracy significantly greater than a random guess.
    3. Emerging approaches – We investigate whether ‘unlearning’ techniques can be used to reduce the costs and time associated with retraining models in response to requests to delete personal information by customers under the ‘right to erasure’, one of the proposed changes to the privacy act.

    1. Challenges of applying data privacy in everyday settings

    At this stage, the application of data privacy approaches to machine learning models is still a maturing research area. Even in traditional data contexts, masking private information can be difficult, and unexpected privacy issues can still arise even after using standard de-identification techniques that are well developed and commonly implemented.

    For example, the Australian Federal Department of Health unintentionally breached privacy laws when it published de-identified health data records of 2.5 million people online in 2016. Despite the published dataset complying with protocols around anonymisation and de-identification, it was found that the data entries for certain individuals with rare conditions could easily be re-identified by cross-referencing the dataset with a few simple facts from other sources such as Wikipedia and Facebook.

    Re-identifying de-identified data – it’s a worldwide risk

    Re-identification risks following the public release of granular datasets are not unique to Australia, with several recent cases emerging around the world. One such case relates to the National Practitioner Data Bank (NPDB), which is the national database for medical malpractice reports and disciplinary action in the United States. Although the names of individual doctors were removed in the publicly available de-identified dataset, several journalism organisations published investigative reports in 2011 that proved it was possible to match de-identified NPDB data entries to a specific doctor by cross-referencing other publicly available information. In response, the United States Department of Health and Human Services was forced to introduce a strict data use agreement stipulating that users must only use the data for statistical analysis, and specifically not to identify doctors and medical providers.

    Search for solutions

    These two examples demonstrate that even with the correct application of de-identification techniques, challenges commonly arise from the collation of released datasets with other publicly available information. This is a particular issue for granular datasets and individuals with unusual circumstances, often referred to as outliers.

    It’s not all bad news, however, and several techniques are emerging to address these data privacy risks, as the quest to find solutions gains momentum. We explore some of these below and whether they might be applicable in machine learning environments.

    Publicly sharing aggregate statistics gives a bird’s eye view without revealing an individual’s data

    2. Differential privacy – a modern approach for traditional data contexts

    In addition to releasing granular datasets, it is also common for companies and governments to share aggregate statistics and other high-level information about individuals within their datasets. Under normal circumstances, it can even be possible for information about specific individuals to be ascertained from these aggregate statistics, provided there is sufficient background information available on individuals within the population.

    How to share without getting personal

    Developed by researchers at Microsoft in 2006, differential privacy is a system that permits the public sharing of aggregate statistics and information about cohorts within datasets, without disclosing private information about specific individuals. It ensures that knowledge of the individual cannot be ascertained with confidence, regardless of what other information is known.

    In practice, differential privacy is implemented by adding noise or randomness to a dataset in a controlled manner such that aggregate statistics are preserved. This approach means it is still possible to learn something about the overall population, but it is much more difficult (if it is feasible at all) to breach privacy as attackers cannot confidently extract exact values of private information data for individuals.

    Fit for outwitting attackers

    To illustrate, we consider a simple example of a dataset that contains information on whether individuals prefer to exercise during the day or at night.

    Instead of publishing information on the specific individuals exercising at each time, statistics such as the mean or variance may be computed and shared based on the aggregate data to help protect individuals’ privacy. For instance, the published results might reveal that 70 out of 100 individuals prefer to exercise during the day, but not which exact people.

    However, without differential privacy implemented, it may still be possible for a malicious attacker to use these aggregate statistics, along with information on a sufficient number of individuals’ exercise preferences, to infer private information for the remaining individuals within the dataset.

    Make some noise with the flip of a coin

    Differential privacy techniques can be implemented to address these risks. In its simplest application, random noise can be added through the flipping of a coin for each entry of the Exercises day or night? column (or field) in the data, as we illustrate in the diagram below. If the flipped coin shows heads, then no adjustment is made and the true entry of the Exercises day or night? column is returned. On the other hand, if the flipped coin shows tails, the coin is flipped again and the entry adjusted to be day (heads) or night (tails). These arbitrary small substitutions protect individuals’ privacy as it means that potential attackers can never know with certainty whether the output is randomly generated or the truth.

    After the introduction of random noise, the statistics may reveal that the number of people who prefer to exercise during the day is 68 or 73, rather than the exact number of 70. This inaccuracy helps to preserve the privacy of individuals but has very little impact on the patterns of groups within the dataset – around 70 per cent of people still prefer to exercise during the day.

    Introducing inaccuracy could thwart an attacker but preserve patterns and protect people’s privacy

    Well-deserved reputation for keeping people’s data safe

    Differential privacy makes it possible for companies and governments to collect and share aggregate information about individuals’ behaviour, while maintaining the privacy of individual users. It has earned a well-deserved reputation providing principled and powerful mechanisms for ensuring privacy, and is an increasingly common approach in data governance settings. For example, Facebook has used differential privacy to protect data made publicly available to researchers analysing the effect of sharing misinformation on elections, while Uber has used it to detect statistical trends in its user base without exposing personal information.

    Applying differential privacy to machine learning models

    In broad terms, the introduction of noise under differential privacy conflicts with the philosophy of machine learning models, which typically rely on individual variation. Nevertheless, researchers have developed tools to apply differential privacy in machine learning, with the goal of limiting the impact of individual records used in the training dataset (data that helps to ‘train’ an algorithm or machine learning model) – particularly sensitive features – on model outputs. In turn, this limits the information about individuals that can be inferred from the model and its outputs.

    Differential privacy can be implemented in machine learning algorithms in different ways depending on whether the task is supervised or unsupervised learning (under unsupervised learning, models work on their own, typically with unlabelled data, to discover patterns and information that were previously undetected). Common places where noise can be introduced into machine learning models include the training dataset, the predicted model outputs and gradients (in simple terms, the speed at which a machine learning model ‘learns’).

    Adding ‘noise’ or randomness to a dataset puts a barrier between attackers and private information

    Model performance vs privacy protection – a precarious balance

    However, studies to date have found that while current mechanisms to apply differential privacy to machine learning models can reduce the risk of privacy attacks, they come at the cost of model utility and rarely offer an acceptable balance between the two. This means differential privacy settings that add more noise in order to provide strong privacy protections typically result in useless models, whereas settings that reduce the amount of noise added in order to improve model utility increase the risks of privacy leakage.

    The cost to model utility is also reflected in the Warfarin study we discussed in Part 1 of our series, which considered a regression model that predicted the dosage of Warfarin using patient demographic information, medical history and genetic markers. Researchers also showed that the use of differential privacy techniques to protect genomic privacy substantially interfered with the main purpose of this model, increasing the risk of negative patient outcomes such as strokes, bleeding events and mortality beyond acceptable levels.

    The hard truth about preventing privacy attacks

    Despite differential privacy’s reputation, it cannot yet be implemented for challenging tasks such as machine learning models without substantial privacy compromises being made to preserve utility. Researchers concluded that alternative means of protecting individual privacy must be used instead in settings where utility and privacy cannot be balanced, especially where certain levels of utility performance must be met.

    Achieving effective model performance and privacy protection is an evolving balancing act

    3. ‘Unlearning’ techniques

    Also in Part 1, we explored the implications of the ‘right to erasure’ for machine learning models, and found that in addition to the deletion of the individual’s data itself, any influence of the individual’s data on models may also be required to be removed upon customer request.

    Striving for efficiency

    The most straightforward approach to achieve this is to retrain machine learning models from scratch using an amended dataset excluding the individual’s data, but this is often computationally costly and inefficient, particularly for large datasets and frequent erasure requests.

    Alternative ‘unlearning’ techniques have been developed with the aim of avoiding these inefficiencies. These techniques are applied to ensure that a model no longer uses the data record that has been selected for erasure – in other words, they guarantee that training a model on a person’s data record and unlearning it afterwards will produce the same model distribution as if it were never trained on the data record at all. But ‘forgetting’ data is not an easy path for a model.

    Why is unlearning so challenging for machine learning models?

    Applying and implementing unlearning techniques for machine learning models is not straightforward for the following reasons:

    • Limited understanding of how each data record impacts the model and its parameters – Little research has been conducted to date to measure the influence of a particular training record on parameters for most machine learning models, with the few techniques tested so far found to be computationally expensive for all but the simplest of machine learning models.
    • Randomness in training methods and learning algorithms – A great deal of randomness exists in the training methods for most complex machine learning models, for example small batches of data are randomly sampled from the training dataset during model training. By design, learning algorithms are applied to search a vast hypothesis space (the set of candidate models from which the algorithm determines the best approximate for the target function).
    • Incremental training procedures – For most machine learning, the development process is incremental, meaning that an initial model is developed and then incrementally improved or updated as new data becomes available. As such, it is complex to remove the impact of training a model on an individual training record at a particular stage, as all subsequent model updates will depend on that training point, in some implicit way.

    Break it down – honing in on ‘slices’ and ‘shards’ of data

    One emerging unlearning technique that has been designed to overcome these challenges is Sharded, Isolated, Sliced and Aggregated (SISA) training, a framework that speeds up the unlearning process by strategically limiting the influence of an individual data record during training.

    Under SISA training, the training data is divided into multiple shards or pieces so that a single shard becomes a separate data record. By training models in isolation across each of these shards, only the affected models will need to be retrained when requests to erase an individual’s data are made, limiting retraining costs as each shard is smaller than the entire training dataset.

    In addition, each shard’s data can be further divided into slices and presented incrementally during training, rather than training each model on the entire shard directly. Slicing further decreases the time taken to unlearn, albeit at the expense of additional storage, as model retraining can be started from the last known point that does not include the data record to be unlearned (identified from records of the model parameters that are saved before introducing each new slice).

    SISA training divides data into smaller shards to assist models in unlearning a person’s information

    DaRE to dream – random forests spread their roots

    Another emerging technique that supports adding and removing training data with minimal retraining is data removal-enabled (DaRE) forests, a variant of random forests, which are a popular and widely used machine learning model that consists of an ensemble of decision tree models (another type of machine learning model that successively splits the data into different segments along ‘branches’ and ‘leaves’).

    DaRE makes retraining more efficient through two techniques:

    • Reducing dependency of the model structure on the dataset
    • Only retraining portions of the model where the structure must change to match the updated dataset.

    DaRE randomises both the variables used and the thresholds adopted for splitting in the upper layers of trees so that the choice is completely independent of the dataset and so this part of the model never needs to be changed or retrained. As splits near the top of each tree contain more data records than splits near the bottom and are therefore more expensive to retrain, strategically placing random splits near the top of the tree avoids retraining the parts that are most costly.

    This randomised structure also helps to reduce the retraining required, as individual data records are isolated to certain parts of the tree (‘subtrees’). When adding or removing data, stored data statistics for each level of the subtree are updated and used to check if a particular subtree needs retraining, which limits the number of computations through the data.

    But are these techniques any good?

    Studies using real-world datasets have shown that both of these techniques are materially faster than retraining models from scratch, while sacrificing very little in predictive model performance. For instance, researchers found that, on average, DaRE models were up to twice as fast as retraining from scratch with no loss in model accuracy, and two to three times faster if slightly worse predictive performance was tolerated.

    However, techniques do not always provide the expected benefits. Testing revealed that these speed-up gains from SISA training exist only when the number of erasure requests is less than three times the number of shards. Care must be taken when increasing the number of shards, as using a smaller amount of data to fit each model may make them less accurate, particularly for complex learning tasks.

    The fine print

    Other drawbacks of these emerging techniques are the large storage costs associated with them and their limited applicability across all machine learning algorithms. For instance, SISA training works best when models learn by iterating through the data (such as via stochastic gradient descent) and is not suitable for algorithms that must operate on the entire dataset at once.

    There are also likely to be challenges in implementing them in practice, as these approaches cannot be retrofitted onto current systems and would require retraining of existing models and a fundamental redesign of existing machine learning pipelines with unclear effects.

    DaRE reduces the need to retrain models to ‘unlearn’ data by splitting upper parts of decision trees

    The future looks …?

    The proposed changes to the privacy act may place significant governance and compliance burdens on organisations. Moreover, if tough positions are taken on items such as privacy of inferred data and right to erasure, then it may not even be feasible for organisations to comply with these requirements while continuing to operate complex and evolving machine learning systems.

    Although the application of existing data privacy approaches and emerging unlearning techniques to machine learning have shown some promise, this area of research is still in the very early stages of development. We cannot be certain that further research will result in commercially robust solutions that allow organisations to avoid investing in onerous and expensive procedures to ensure compliance. As shown by the unacceptable utility-privacy compromise for differential privacy, there may also be some fundamental limitations on achieving compliance via shortcuts.

    In future, it will be important to promote and invest in research for solutions that maintain privacy, while also providing other desirable model properties such as transparency and explainability. In the interim, if the proposed amendments to the privacy act are enacted and interpreted as applying to machine learning models, organisations may be forced to significantly simplify their modelling approaches and infrastructures to have a better chance of meeting compliance obligations.

  • Australian Actuaries Intergenerational Equity Index Update

    The 2021 Australian Actuaries Intergenerational Equity Index update and accompanying Green Paper show the equity gap between Australia’s generations closed slightly after ‘a year like no other: 2020’.

    Commissioned by the Institute and developed by Taylor Fry’s Hugh Miller, Ramona Meyricke, Laura Dixie and Matthew Bray, the Index takes a broad view of 24 indicators across six domains to track how wealth and wellbeing for different generations change over time. The domains include economic, housing, social, health and disability, education, and the environment.

    The update released today shows a drop in the values for those aged 65 to 74 years of age and an increase for those aged 45 to 54, and 25 to 34. This reversal breaks a seven-year streak of growing inequity, between the youngest and oldest cohorts.

    The Index shows that 2020 is an interesting mix of temporary spikes, continuation of long-term trends, and opportunities to innovate in the policy space.

    For more information, head over to the Actuaries Institute website.

  • The Australian privacy act is changing – how could this affect your machine learning models?

    Countries across the globe are grappling with how to protect people’s privacy in the age of big data. In the first of a two-part series, we look at changes to the privacy act under consideration, and how to address the potential impacts for industry and consumers in a world increasingly tied to artificial intelligence and machine learning.

    Community concerns have become more urgent in recent years regarding the way businesses collect, use and store people’s personal information. In response to a recommendation from the Australian Competition and Consumer Commission’s (ACCC) Digital Platforms Inquiry (DPI) – the Australian Government is reviewing the Privacy Act 1988 to ensure privacy settings empower consumers, protect their data and best serve the Australian economy.

    Of the several proposed legislative changes, we outline three scenarios being considered and how they may impact organisations that collect and process customer data – in particular, those organisations that use machine learning algorithms in privacy-sensitive applications, such as predicting lifestyle choices, making medical diagnoses and facial recognition. We break down the three proposed changes to the privacy act below:

    1. Expansion of the definition of personal information to include technical data and ‘inferred’ data (information deduced from other sources)
    2. Introduction of a ‘right to erasure’ under which entities are required to erase the personal information of consumers at their request
    3. Strengthening of consent requirements through pro-consumer defaults, where entities are permitted to collect and use information only from consumers who have opted in.

    Algorithms used in privacy-sensitive applications such as lifestyle choices may be affected

    1. Treating inferred data as personal information

    As part of the proposed changes to the privacy act, the Government is considering expanding the definition of personal information to also provide protection for the following:

    • Technical data, which are online identifiers that can be used to identify an individual such as IP addresses, device identifiers and location data.
    • Inferred data, which is personal information revealed from the collection and collation of information from multiple sources. For example, a data analytics company may combine information collected about an individual’s activity on digital platforms, such as interactions and likes, with data from fitness trackers and other ‘smart’ devices to reveal information about an individual’s health or political affiliations.

    How will this affect consumers, organisations and models?

    The possibility of including inferred data as personal information in the privacy act represents a fundamental expansion in what organisations might think of as personal information. Typically, it is considered to be information provided by a user or trusted source that is known with a reasonable amount of certainty. For example, an organisation will know a customer’s age by requesting their date of birth.

    In contrast, inferred information can usually only ever be known probabilistically. Under the proposed change, knowledge such as ‘there is an 80% probability that this customer is between 35 and 40’ could be treated the same way as knowledge such as ‘this customer is 37’.

    This means model outputs may become ensnared in restrictive governance requirements because inferred customer information is often generated as an output of machine learning models. There is even a possibility that governance requirements may be extended to the models themselves, which have been shown to leak specific private information in the training data (data used to ‘train’ an algorithm or machine learning model) to a malicious attacker. The consequences for machine learning model regulation are potentially hugely significant for consumers and organisations, given personal information is subject to many more rights and obligations than the limited set currently applicable to models.

    In addition, newly afforded rights for consumers may mean that they can request information on model origination or trading, and restrict future processing and use of models. On top of this, companies may now be required to design models that comply with data protection and security principles, and to discard models in order to comply with storage limitation principles.

    There are many challenges in protecting against unintended privacy leakages from models

    When machine learning models leak private information

    Privacy attacks on machine learning models can be undertaken with full access to a model’s structure (‘white box’ attack) or limited query access to a model’s observable behaviour (‘black box’ attack). Through this process, it is possible to infer probabilistic private information about an individual with a success rate significantly greater than a random guess.

    Medical leaks

    For example, US studies were performed on a regression model that predicted the dosage of Warfarin, a widely used anticoagulant, using patient demographic information, medical history and genetic markers. Researchers reverse engineered a patient’s genetic markers using only black-box access to the prediction model and demographic information about patients in the training data – with a success rate similar to a model trained specifically to predict the genetic marker.

    Uncanny facial recognition

    Recent studies have also demonstrated that recognisable images of people’s faces can be recovered from certain facial recognition models given only their name and black-box access to the underlying model to the point where skilled crowd workers could use the recovered photos to identify an individual from a line-up with 95% accuracy. The picture below on the left was recovered with only a name and limited ‘black box’ access to a model. It is eerily similar to the actual image on the right, which was used to train the model.

    Source

    A problem with predictive text

    Privacy leakage is particularly problematic for generative sequence models like those used for text auto-completion and predictive keyboards, as these models are often trained on sensitive or private data such as the text of private messages. For these models, the risks of privacy leakage are not restricted to sophisticated attacks, but may even occur through normal use of the model.

    For example, model users may find that ‘my credit card number is …’ is auto-completed by the model with oddly specific details or even obvious secrets such as a valid-looking credit card number. Protecting against this sort of leakage is surprisingly challenging, with research indicating that unintended memorisation can occur early during training, and sophisticated approaches to prevent overfitting are ineffective in reducing the risk of privacy leakages.

    Challenges in preventing privacy leakages

    Machine learning models tend to memorise their training data, rather than learning generalisable patterns, unless specific care is taken in their construction – a phenomenon known as ‘overfitting’.

    Unsurprisingly, theoretical and experimental results have confirmed that machine learning models, including regression and target classification models, become more vulnerable to privacy attacks the more they are overfit. What is a lot more surprising is that, while reducing overfitting can offer some protection, it does not completely protect against privacy leakage. Studies have found that models built using stable learning algorithms designed to prevent overfitting were also susceptible to leakage.

    Consumers may gain the right to erase their data from a model, causing concerns for business

    2. Right to ‘erasure’ – deleting personal information by customer request

    Put into effect across the European Union in 2018 and sometimes known as the ‘right to be forgotten’, the broad ranging General Data Protection Regulations include a ‘right to erasure’, which provides European citizens with a right to have their personal data erased under certain circumstances, including when consumers have withdrawn their consent or where it is no longer necessary for the purpose for which it was originally collected and processed. This serves as a reference point for similar potential rights in the Australian Government’s review of the country’s privacy act.

    Why organisations are cautious

    Although organisations broadly support the introduction of a right to erasure, concerns have been raised about the potential negative impact the deletion of data may have on businesses and public interest. In its submission, Google considers it “important to include a flexible balancing mechanism or exceptions to such a right so as to enable businesses to consider deletion requests against legitimate business purposes for retaining data”.

    Other organisations have proposed that any ‘right to erasure’ needs to be subject to appropriate constraints, which balance requests against legitimate business reasons, compliance with existing regulatory obligations and the technical feasibility and operational practicalities of implementation.

    What could the ‘right to erasure’ mean for machine learning models?

    Given machine learning models can be considered as having processed personal information, a consumer may wish to exercise their right to erase themselves from a model to remove unwanted insights about themselves or a group they identify as being part of, or to delete information that may be compromised in data breaches. Consumers may also seek to prevent the ongoing use of models that incorporate their data to prevent companies from continuing to derive a benefit from having historically held their information.

    What would it mean for business?

    In most circumstances, the removal of a single customer’s data is unlikely to have a material influence on a model’s structure. The ‘right to erasure’ becomes more powerful when exercised collectively through a co-ordinated action by a group of related customers, as their data is more likely to have had a material influence. For example, the use of facial recognition technology for identifying a marginalised group may cause a collection of customers to feel they are disadvantaged by the ongoing use of models trained on their data.

    An immediate challenge is what constitutes a group before the right to erasure applies? Pragmatically, it would be difficult to define a general rule and it is plausible that even an erasure request from a single customer would require the removal of their data from the model, as well as their exclusion in the processing of data. This would be a very tall order for organisations to comply with, to the point of being unworkable in some situations where the cost and time required to comply with erasure requests rapidly outweighs the benefits of using a machine learning model at all.

    Organisations are cautious about proposals to pre-select digital consent settings to off

    3. Strengthening consent – how will it affect model insights?

    What pro-consumer defaults really mean

    With consumers preferring digital platforms collect only the information they need to provide their products or services, the final ACCC Digital Platforms Inquiry report recommended that default settings enabling data processing for purposes other than contract performance should be pre-selected to off, otherwise known as ‘pro-consumer defaults’.

    Why organisations are cautious

    While organisations strongly support improved transparency for consumers through meaningful notice and consent requirements, they caution against proposals that would force businesses to provide long, complex notices or separate consents for each use of personal information.  They argue that these approaches would cause consumers to switch off and not meaningfully engage with relevant privacy notices and controls, a phenomenon known as ‘consent fatigue’.

    So do modellers have to start from scratch?

    The potential implication of pro-consumer defaults is that the data available to organisations for future training of machine learning models may be limited, as it can be reasonably expected that relatively few consumers would make the effort to deliberately reverse these settings.

    Consider an extreme scenario, whereby historical data that was collected prior to the introduction of the laws is no longer permitted to be used, through the new pro-consumer default settings. Many organisations may struggle in the weeks immediately following the change, and could effectively be required to rebuild models from scratch at a time when very little data may be available.

    Could privacy protections lead to misleading model insights?

    Problems could also arise where the profile and characteristics of customers that opt-in to data processing are materially different from the wider customer base who choose the default settings. These issues are likely to be more significant for analysis that is used to support operational and strategic decisions, as model insights may be based on a skewed view of customer behaviour.

    Assessing your privacy risks – steps you can take now

    With the review still underway, the final form of the changes made to the privacy act remain uncertain, with the potential ramifications for machine learning models even more so.

    Nevertheless, there are some practical steps organisations can take now to assess and reduce potential privacy implications for their machine learning models and pipelines:

    • Establish or review the composition of a cross-functional working group – The selection of key staff would define the technical, governance and compliance-related tasks that need to be completed, and identify who owns the process in areas such as analytics, privacy, legal, product owners and data.
    • Review data usage policies – These should include explicit policies for how to use certain types of data and in which contexts, so that a consistent approach is maintained across the organisation.
    • Track and document the use of customer data in models – This should include collating the following details for all existing models and pipelines: Original source of the data; applicable privacy statements for each data item and source; how and where the data is used in modelling processes; where possible, the contribution of each data source to model and organisational performance; if warranted, explicit testing of the impact on models of removing specific customer information, particularly for more sensitive information.

    Taking these steps will provide organisations with a good view of where privacy risks may arise, and ensures they are better prepared for the potential implications of changes arising from the review of the privacy act.

    In Part 2 of our ‘machine learning and the privacy act’ series, we explore the potential to apply existing data privacy approaches to machine learning models, what emerging approaches such as ‘machine unlearning’ have to offer, and the search for commercially robust solutions for complying with privacy requirements.

  • Where are the businesses most reliant on JobKeeper (January)?

    As Australian businesses face their first month without the JobKeeper scheme, we look at how the take-up of the scheme has changed since it started in April 2020 to the latest data release which covers payments to the end of January 2021. Our map covers the whole of Australia by local government area, and shows the proportion of businesses receiving the payments has varied significantly by region.

    The JobKeeper Payment scheme was a temporary wage support measure for businesses significantly affected by COVID-19. The scheme had three phases, each with their own qualification requirements:

    • Phase 1 ran from April to September 2020
    • Phase 2 ran from October to December 2020
    • Phase 3 from January to March 2021.

    The latest data on the JobKeeper scheme reveals in January 2021, the first month of the third and final phase of JobKeeper, 15% of Australia’s 2.4 million businesses were receiving JobKeeper payments (372,000 businesses). The number of businesses qualifying for JobKeeper has been very stable throughout each individual phase, which means figures from January 2021 are likely to be a good indicator of the overall reliance on the scheme at the end of March 2021.

    At the height of the scheme, 40% of all businesses in Australia were registered to receive JobKeeper payments.

    Our map shows regional variability in the take up of JobKeeper by businesses, notably:

    • Melbourne’s second wave of COVID-19 infections throughout the winter and spring months of 2020 led to a heavy reliance on JobKeeper in the city. Throughout July and August 2020, 46% of Melbourne businesses were receiving JobKeeper payments. In the last phase of the scheme, beginning in January 23% of the city’s businesses were still dependent on the subsidy.
    • Similarly, Sydney’s December outbreak also seems to have had an impact on the city’s reliance on JobKeeper, albeit not to the same extent as in Melbourne.

    Taylor Fry Principal Alan Greenfield says, “The outbreaks and resulting restrictions incurred in Melbourne and Sydney have had a measurable impact on the demand for JobKeeper in those two cities. In the last phase of the scheme, 23% and 17% of businesses were on JobKeeper in Greater Melbourne and Greater Sydney respectively. This is vastly different to 13%, 13% and 11% for Brisbane, Adelaide and Perth, who have not experienced significant outbreaks or lockdowns.”

    Outside Australia’s capital cities, demand for JobKeeper was high during the first six months of the scheme, but businesses in regional areas have since been less reliant on the subsidy than those in capital cities.

    “The lower reliance on JobKeeper payments in regional areas is partly due to the mix of businesses in those areas, fewer restrictions on movement and smaller rate of change in consumer behaviour compared to cities. One exception is destination tourism areas like Byron Bay and Port Douglas where JobKeeper reliance was still relatively high towards the end of the scheme,” says Greenfield.

    About our map

    We’ve used the latest data (JobKeeper processed applications between April 2020 and January 2021) from the Treasury and the Australian Bureau of Statistics to estimate and map, the proportion of businesses in each Local Government Area (LGA) who have registered to receive the JobKeeper payment. LGAs with the highest proportion of its businesses registered to receive the payment are coloured red, and those with the lowest proportion are coloured green.

    The interactive map below shows JobKeeper reliance by Local Government Area over time (updated 7 April 2021).

    https://taylorfry.shinyapps.io/jobkeeper

    How do I use the map?

    Click on an LGA to see what proportion of its businesses are registered to receive the JobKeeper payment.

    Disclaimer

    This page and its contents herein, including all data, mapping and analysis (“Page”), copyright 2021 Taylor Fry, all rights reserved, is provided solely for information purposes. You should not rely on this page for financial advice. Use of the Page by commercial parties and/or in commerce is strictly prohibited. Redistribution of the Page is strictly prohibited. When linking to the page, attribute the Page as Taylor Fry’s COVID-19 Financial Impact Index. The Page relies upon publicly available data from multiple sources that do not always agree. Taylor Fry hereby disclaims any and all representations and warranties with respect to the Page, including accuracy, fitness for use, reliability, completeness, and non-infringement of third party rights. Any use of Taylor Fry’s names, logos, trademarks, and/or trade dress in a factually inaccurate manner or for marketing, promotional or commercial purposes is strictly prohibited. These terms and conditions are subject to change. Your use of the Page constitutes your acceptance of these terms and conditions and any future modifications thereof.

  • Cyber insurance – key issues for insurers

    Spurred on by the pandemic, technology use is on the rise – and, along with it, an increase in cyber attacks, making security a top concern for companies. We look at the growing area of cyber insurance and some of the new approaches insurers will need to succeed in this dynamic environment.

    With greater connectivity in every part of our lives – at work, home and socially – our devices and IT systems have never seemed more exposed. What is the role of cyber insurance in mitigating these risks, how has it evolved and where is it headed? While cyber insurance premiums have grown significantly in the past decade, it’s still a small class of business compared to more traditional liability coverages, with only a few insurers currently offering it in Australia. Despite take-up remaining low, especially among small-to-medium enterprises (SMEs), the market continues to grow, as cyber attacks gain in frequency and sophistication. Astute insurers will be exploring the many new ideas in underwriting and pricing to tackle their challenges now and in the future.

    What is cyber insurance?

    First things first – cyber insurance is an insurance product that protects businesses from financial risks relating to cyber incidents. Policies will usually cover:

    • First party losses – These are losses suffered directly by the insured business
    • Third party losses – These are costs incurred by the insured relating to a cyber event experienced by another party but where fault lies with the insured.

    Cyber insurance is usually defined as a liability product and is often sold as an extension to an existing standard business liability product.

    Cyber losses can also arise from traditional liability policies such as D&O if these policies do not explicitly exclude cyber risks. This is known as silent cyber or non-affirmative cyber cover. As cyber risks evolve over time, however, more insurers are clarifying cyber risk as a separate product and then excluding this risk from their standard liability policies. This means policyholders who require cyber cover need to explicitly take out a policy to cover this risk.

    The underlying risks cyber policies cover are rapidly changing over time

    A brief history

    Where did it all start? Insurance policies for cyber insurance were first developed in the late 1990s. Initially policies provided predominantly third-party cover for companies that provided IT services used by other businesses. As technology advanced and became integral for more companies, cyber insurance expanded, and insurers began offering first-party coverage to any company using technology.

    The cyber insurance scene today

    The growth in cyber policies has resulted in a range of coverage and exclusions in the products offered. Standard coverages for cyber policies, as distinct from more traditional liability coverages, include:

    • Business interruption costs
    • Network security costs
    • Costs arising from theft or fraud
    • Forensic investigation costs
    • Costs related to data loss and restoration
    • Extortion costs
    • Costs associated with any information privacy penalties.

    A changing landscape

    Another point that differentiates cyber insurance from other classes is that the underlying risks the policies cover are rapidly changing over time. As technology becomes more powerful and essential for all organisations, this provides greater opportunities for cyber criminals.

    One example is the increase in targeted ransomware attacks. Attackers using ransomware would previously target anyone they could trick into having a malicious payload delivered to install the ransomware. This was generally home users they would extort for a few hundred dollars to regain access to personal files and photos.

    Go phish – a worrying trend

    Now attackers are specifically targeting individual firms and blackmailing them for at least tens of thousands of dollars at a time. These attacks start with reconnaissance and then breaking into the company’s network, predominantly using targeted phishing attacks (otherwise known as spear-phishing). They then exfiltrate data from the company, downloading it to a remote location and encrypting as many files as possible using a scrambling algorithm – and only they have the key to it. They then demand a large sum of money to decrypt the file to restore network and system operations.

    As well as an increase in the number of cyber threats, the cost associated with data breaches is also increasing. The average cost of a data breach for Australian organisations was estimated to be $3.35 million in 2020. This was an increase of almost 10 per cent from the previous year. This increase in risk and costs means cyber insurance is becoming more of a necessity for organisations.

    Australia’s increasing regulatory focus

    In Australia, the focus on cyber risks and cyber insurance is increasing. On the regulatory front, the Australian Prudential Regulation Authority (APRA) introduced Prudential Standard CPS 234 in 2019. This standard requires that APRA-regulated entities “take measures to be resilient against information security incidents” and inform APRA of any material information security breaches.

    Under the standard, regulated entities must maintain an information security capability commensurate with the size and extent of threats to information systems to ensure continued operation. The standard does not mandate entities to hold cyber insurance.

    Sharp eye on data collection

    APRA is also currently consulting with insurers on extending its insurance data collection to separately collect premium and claims information for cyber insurance. Currently, cyber cover is included under the public liability class. The increase in cyber policies and limited availability of data for this class has been cited as the reason for the proposed change.

    This increase in focus is not just limited to the financial services. In November 2020, the Australian Government announced that a cyber security cabinet role will be created. This move has been in response to an increase in attempted cyber attacks during 2020. Several of these attacks have been on critical infrastructure providers. Many high-profile cyber attacks also occurred last year in Australia, including:

    • Toll Group, which had two separate ransomware attacks in January and April.
    • Regis Healthcare, which had sensitive data stolen in a ransomware attack in August.
    • Australian Defence Force recruiting system, which was taken offline for 10 days in February to contain a security breach.
    • Levitas Capital, which had its email system compromised by a bogus Zoom invitation. This resulted in $8.7 million in fraudulent invoices being paid. While this money was recovered, Levitas was forced to close its business due to clients withdrawing their funds as a consequence of the attack.

    Phishing scams targeting firms are netting attackers tens of thousands of dollars at a time

    Serious breaches ring in 2021

    Most recently, the new year had barely dawned when the Reserve Bank of New Zealand disclosed on 11 January it had suffered a serious data breach of its file-sharing service provided by California-based data protection firm Accellion.

    About two weeks later, Australian Securities and Investment Commission (ASIC) reported a cyber security breach, which had occurred on 15 January. The national corporate regulator said one of its servers used for transferring information, including credit licence applications, had been illegally accessed through the Accellion software.

    These incidents highlight how a weakness in a single piece of software can result in cyber events for several different companies.

    Varied corporate view of cyber insurance

    From an insurance perspective, cyber insurance take-up has been limited, particularly in the SME market. The Chubb 2019 Cyber Preparedness Report found only 27 per cent of Australian SMEs have cyber insurance. Some reasons for this low take-up by SMEs may be due to the belief that their existing liability insurance will cover any cyber risks, or a view that cyber risk is a relatively low risk.

    For larger corporates, cyber security is a key focus. As businesses have become more reliant on their IT systems with the increase in people working from home due to the COVID-19 restrictions, this has increased focus on the risks.

    Challenges in pricing for cyber

    Traditional insurance products such as motor, property and liability are generally priced by analysing past claims data against various rating factors. This analysis then allows insurers to estimate the expected future costs for their customers based on their declared rating factors, and this is used to set premiums. It’s an approach requiring a large amount of past claims and rating factor data. It also assumes the underlying risk associated with the product is not changing significantly over time.

    For cyber insurance, there is a limited amount of past data, given it’s a relatively new product, and the underlying risk is rapidly changing over time as technology advances. Cyber is different from other insurance products in that the risks transcend geographical borders, highlighting an urgent need for global solutions. Insurers are also generally protective of the exact approach they use for pricing, which adds an extra challenge.

    Some transparency in America

    In the United States, however, pricing structures are more transparent, and insurers are required to file policy details with state regulators. These filings include policy details and rating structures used to determine premiums. A 2019 study analysed these filings for cyber insurance policies in New York, Pennsylvania and California. This study found there were three main rating structures:

    • Flat rates – This is the same rate for all policies, or flat rates based on a small set of hazard groups. These rating structures tend to be used for policies offered to smaller companies.
    • Base rates with multipliers based on rating factors – This is a standard insurance rating structure used for other classes of insurance. The base rate is usually calculated as a function of the size of the insured (for example, based on revenue, assets or number of employees). The rating factors reflect the industry the insured is operating in, and may include retention limits and risk ratings.
    • Base rates with security questions – This is a more advanced version of the rating factor approach, where the rating factors reflect the cyber security measures the insured has in place. Under this rating structure, a company with more advanced cyber security would pay a lower premium than a company with less advanced cyber security measures, if all other features for the two companies were the same.

    Are these approaches sufficient to keep pace with evolving cyber risks? While they go some way to addressing current issues, it’s clear the industry needs to be agile if it’s to keep pace with the evolving cyber environment and looking at what might be next in adequately assessing risk.

    Australia’s increase in cyber crime

    In its latest cyber threat report, the Australian Cyber Security Centre (ACSC) has noted that “malicious cyber activity against Australia’s national and economic interests is increasing in frequency, scale and sophistication”. The report notes that “Australia’s relative wealth, high levels of online connectivity and increasing delivery of services through online channels make it very attractive and profitable for cybercrime adversaries”. Over the year to 30 June 2020, the ACSC responded to 2,266 cyber incidents, and also noted an increase in spear-phishing campaigns during the COVID-19 pandemic.

    The need to evolve

    This increasing underlying risk suggests traditional insurance pricing techniques also need to evolve to accommodate cyber insurance, so that insurers continue to adequately price the risks and that insured parties understand their specific vulnerabilities and what they can do to reduce their risk of being victims of cyber crime.

    Traditional insurance pricing and underwriting techniques are based on an annual review of an insured’s risk. Individual adjustments for the insured’s risk are often based on an underwriter’s knowledge of the insured, combined with the recent claims experience for the insured. The speed at which risks change for cyber perils means it’s important insured entities understand what controls they need to implement to reduce the risk and also what actions they could take to mitigate the risks if an attack occurred.

    Future-thinking insurers looking to respond dynamically will lead the way ahead

    Assessing risk – the new ways forward

    Recently, insurers have begun using scanning tools to assess the individual vulnerability for each insured entity. Companies such as UpGuard, BitSight and Security Scorecard provide scores that measure a company’s cyber security posture based on automated tests of the company’s online systems. These tools formulate a score based on system vulnerabilities, reputation risk, phishing and malware, email security and network security. As well as providing a measure of overall risk, the tools can also provide feedback on areas an insured company can address to improve its cyber resilience.

    In New Zealand, IAG recently launched a cyber security tool for SMEs, which has been developed in partnership with UpGuard. Brokers such as Willis Towers Watson and global network TechAssure are also developing similar services designed to assist companies to assess their cyber risks.

    Other tools insurers could use in their pricing and underwriting of cyber insurance include:

    • Dark web scans – These can be used to look for any mentions of the insured company that may indicate it’s a potential target, has been previously compromised or suffered a data breach.
    • Penetration testing – This is similar to the scanning tools mentioned above, but more intensive and targeted. The insurer’s security analysis team attacks the insured company’s systems to exploit vulnerabilities and effectively assess the strength of their security.
    • Managed service partnerships – By partnering with security service providers, the insurer could provide outsourced security management and monitoring services bundled into the policy.

    Proactive insurers for the win

    Cyber insurance is a rapidly evolving class of business, in terms of the underlying risks and the information available to insurers for pricing and underwriting. Forward-thinking insurers will be looking to supplement their traditional approaches with this new information. The standard bearers in this market will be the proactive insurers able to also provide advice to their customers on weaknesses in their cyber security, and respond dynamically, by way of improved terms or prices, to actions taken by their customers to mitigate these vulnerabilities.

  • Analytics quick wins: Five New Year resolutions to thrive in a climate of economic uncertainty

    How can advanced analytics help you through the COVID-19 storm? We reveal some ways to strengthen your business and prepare you for long-term success.

    The global spread of COVID-19 has created a financial and healthcare crisis no one seemed fully prepared to meet. As stimulus programs wind down, the extent of the consequences will become clearer.

    On the economic front, most businesses have reviewed their costs and made changes to protect their long-term future, with the burden often falling on employees in the form of job losses and reduced hours.

    In March, US weekly jobless claims reached a peak of nearly seven million. Before COVID-19, that figure had not exceeded 700,000 since the 1960s.

    But job counts tell only part of the story. According to the World Economic Forum, 11.7% of working hours have been lost in the first three quarters of 2020, with the impact felt disproportionately by low-income earners.

    Now is a perfect time to consider making some New Year analytics resolutions

    Broaden your thinking
    As business leaders struggle in this tough economic climate, reducing their cost base is an understandable response to reduced demand and increased uncertainty, but it’s not the only way to survive. Consider, for example, the opportunities advanced analytics present in this new environment. COVID-19 has turbocharged digital transformation and changed the way consumers think about product/service delivery. A year ago, no one could have predicted the rapid rise in telehealth, for example.

    As we leave 2020 behind, now is a perfect time to consider making some New Year analytics resolutions.

    Resolution 1: I’m going to boost my organisation’s analytics capability

    The ability to optimise business settings typically reflects the maturity of an organisation’s analytics capability. Most attempt to gain an edge through the proliferation of their customer data. However, the degree of sophistication to harness the potential of those enormous datasets varies significantly. Are your analytical outputs still all focused on retrospective analysis of the past? Or have you moved into future orientated predictive and prescriptive analysis as represented in the chart below?

    Capability Chasm

    Source: Based on an original chart by Erik Marcade, SAP

    Quick wins stem from thinking about actions that move you towards the next level of maturity.

    Focusing your BI reporting in the right places
    Most organisations will employ business intelligence (BI) reporting to help management understand what is happening in the business. An increasing number of entities now employ interactive BI reporting and visualisation tools to allow users to delve into the BI data and try to understand the reasons behind what is happening.

    When developing your BI reporting process, it’s important to think about the measures you track and report, and how they relate to one another. This will increase the focus of your reporting as well as the value of your data, and ensure you’re not swamping users with too much information. It might help to consider BI reporting as a hierarchy of measures categorised as:

    • End outcomes that determine business fundamentals – for example, product sales and revenue
    • Lead indicators that give early clues as to how the end outcomes will perform in the future – think about your end-to-end product sales and customer lifecycles to derive these. For example, home-loan approvals as a lead indicator of home-loan drawdowns.
    • Behavioural indicators – tracking how your customers interact (or don’t interact) with your business and the activity of your sales staff will help inform future operations that ultimately determine end outcomes. For example, the number of customers using online sales channels.

    No time like the present to develop your predictive analytics
    Increasingly, public and private sectors are using predictive analytics to guide decision-making. If your organisation has yet to make that step, now might be the time to start. It doesn’t have to be hard or complex. You might consider one business problem, such as customer retention, and see how a machine learning solution can improve your customer engagement strategies, for example, by identifying customers that are most likely to take their business to a competitor in the near-term.

    If you don’t have in-house capability, external advice doesn’t have to be an elaborate process to add significant value.

    Regardless of where you are on the maturity scale, the next incremental step might be quick and rewarding.

    Resolution 2: I’m going to shift my thinking on my customers from short-term to long-term

    Unlocking long-term customer value is one of the greatest analytical opportunities for many businesses.

    This is particularly true for industries where people have a long-term ongoing need for the product or service, and the opportunity to differentiate from your competitors is relatively limited.

    Telcos, utilities, banking, insurers and other financial service providers are good examples.

    All businesses will have a focus on their near-term financials. The current-year profit-and-loss account and balance sheet will always retain its allure as they headline the next set of financial statements market participants see.

    However, near-term financial incentives are often inconsistent with realising long-term customer value. For example, young banking customers (teenagers and students) tend to generate relatively little revenue for banks. With a near-term focus, a bank might invest relatively little in these customers. However, later in their lives they will need more valuable banking products such as a home loan. A long-term focus would see greater investment in these customers in recognition of their long-term customer value.

    Unlocking long-term customer value is one of the key analytical opportunities for many businesses

    Strategies with lasting appeal
    To achieve a pricing and loyalty strategy that maximises long-term shareholder value, it’s essential to understand which individuals will deliver high or low long-term value.

    When you achieve this, you can target your product pricing so you don’t miss potential revenue opportunities or discount in way that cannibalises existing revenue more than it encourages new sales.

    Additionally, paying attention to existing customers can help avoid the predictable customer attrition that occurs when the business focuses mostly on potential new customers.

    Getting the balance right with your pricing and loyalty strategy is a fine art – not just a matter of luck – requiring well-directed customer analysis that could significantly improve your strategy. It could also have major positive ramifications for your customer and shareholder value.

    Resolution 3: I’m going to tailor my analytics solution to the problem at hand

    Off-the-shelf advanced analytical products are typically slick in their presentation and the way they can be implemented in your IT infrastructure. They are built with this in mind to appeal to as broad an audience as possible.

    There are plenty of use cases where these can be of great benefit, but sometimes, a generic solution won’t give you the answers to a complex business problem. For example, a retention prediction model might highlight customers who are most likely to take their business to a competitor but offer no insight on what to do about it. In these cases, you may want to consider a tailored solution where you have a need for greater insight.

    A tailored solution can explicitly incorporate your specific product, delivery channel, customer and/or market location factors.

    An advantage in doing things differently
    No two businesses are the same and their analytical solutions shouldn’t be either. Sometimes competitive advantage can stem from doing things differently with your analytics approach.  For example, if your advanced analytics approach allowed you to understand the long-term value of your customers better than your competitors, your business decisions informed by this might yield greater long-term shareholder value.

    Resolution 4: I’m going to gain deeper insight through personalisation

    More and more businesses are attempting to personalise their pricing, marketing and services. If done well, engaging customers with a personalised approach and remaining relevant to their needs is a great strategy.

    Some guiding principles to focus your strategy
    To gain the most out of personalisation, keep two key questions in mind:

    • Is my personalisation approach deep enough to make it genuinely personal?
    • Have I considered whether my approach will generate overwhelming negative sentiment about the business?

    Personalised marketing is a good example. We’ve all received marketing emails from businesses that attempt to talk to us as if they know us and who we are. Most of the time these miss the mark. Broadly speaking, businesses accept this. The product sales from the minority that do engage with the marketing is deemed to be worth it.

    Does the upside outweigh the downside?
    Do you understand the power customers may have to affect your business when they don’t engage? The best case is that they are ambivalent to the engagement. But some will view it negatively and attach negative sentiment to the experience and your business.

    Have you tried to establish whether the upside from customers who engage with a personalised approach outweighs the downside from customers who view the engagement negatively?

    A quick review of your personalisation approach may be very enlightening and change the way you think about engaging with your customers.

    With some thoughtful preparation, you can set up a rigorous governance structure

    Resolution 5: I’m going to make sure our analytics governance structure is rigorous

    Many businesses use machine learning and other models to support decision-making, often in real time. In the haste to deliver, it can be easy to overlook putting adequate model governance in place, which can expose your organisation to significant risks. Take, for example, Amazon’s AI recruitment tool from 2014/15, which was biased against women. The tool was built using data from CVs over the prior 10-year period and so reflected male dominance in the tech industry.

    With some thoughtful preparation, you can set up a rigorous governance structure based on these essential components:

    • Risk management framework – Including risk appetite, risk monitoring and controls
    • Governance guidelines – For system design, privacy and fairness guidelines, for example
    • Tracking – Of modelling system performance, changes, accountability and incidents
    • Audit and review – To ensure ongoing fitness for purpose and value realisation.

    Any business can be vulnerable to governance weak spots, especially during times of economic uncertainty. The following considerations highlight common areas of concern that you can address to help strengthen your governance structure:

    1. Design – Have you correctly formulated the model to solve the intended business problem?
    2. Operations – How is the model performing in its regular course of operation? This includes people processes, such as oversight, authorisation, and processes to maintain their performance, as well as systems that support the infrastructure.
    3. External issues – Are there any changes outside of the bounds of the model? This includes changes to upstream and downstream processes that cause the model to behave differently from its original design and intention.
    4. Community and regulatory expectations – Does your model meet expected standards? This includes privacy, fairness, and regulatory conditions, such as those governing the sale of financial products.

    Simple steps towards a robust framework
    Putting in place a good governance structure doesn’t have to be hard. A quick review of your current structure will help ensure you are maximising the value from your modelling infrastructure. It will also help ensure you are minimising the risk of experiencing a damaging governance failure at a time when your business may be less resilient than normal to shocks.

    Analytics quick wins – where do you start?
    By having a conversation. Get your stakeholders in a room and think about your analytics strategy and the potential opportunities raised in this article.

    Many businesses have invested heavily in their analytics capability over the past few years. Arguably, economic uncertainty is when this should pay most dividends, as you look to secure your short-term future and position well for the recovery.

  • A practical business guide to the new design and distribution obligations (DDO)

    ASIC’s new DDO regulations offer firms a chance to realise greater business value as well as greater connection to their customers, despite attributing more accountability to distributors.

    On October 5, the new DDO are due to take effect, outlining product design, distribution and monitoring requirements for organisations – and placing consumers at the heart of product governance. The requirements essentially challenge the assumption that suitable disclosure leads to informed consumer decision-making. In doing so, DDO place obligations on product issuers and distributors to ensure their products meet consumer needs.

    Regulatory Guide 274 sets out ASIC’s interpretation of DDO requirements, expectations for compliance and approach to administering the obligations. The customer-centric approach provides product issuers with a timely opportunity to use their data to best effect. The challenge for product issuers and distributors comes from how to establish effective frameworks that meet DDO requirements and add value by October.

    In this practical guide, we help product issuers navigate their way through the DDO regulations, and suggest ways they can use their data to:

    • Inform the Target Market Determination (TMD) for each product
    • Monitor compliance with DDO requirements on an ongoing basis
    • Move beyond compliance and create value through meeting DDO requirements.

    Step 1: What information should I consider to inform the TMD?

    Firstly, it’s important to define a target market. A target market is defined by the product issuer, who must assess their product as consistent with the objectives, financial situation and needs of a group or ‘class’ of customers. Distribution conditions must be specified, to make it likely that consumers who acquire the product are in the target market. Issuers and distributors must take ‘reasonable steps’ to ensure distribution is consistent with the TMD.

    Setting a suitable definition of ‘class’ of customers, changing systems to capture additional information if required and establishing a suitable monitoring framework takes time. To meet TMD requirements, ASIC considers that product issuers will generally need to:

    • Describe the likely objectives, financial situation and needs of consumers in the target market
    • Describe the product features
    • Explain why the product features are likely to meet consumers’ needs.

    Unlike personal advice, design and distribution obligations do not require product issuers to assess the suitability of products for individual consumers. As mentioned earlier, consideration needs to be given to the likely objectives, financial situation and needs of a class of consumers. However, ASIC does expect issuers to specify the target market with sufficient granularity and to have tested its product. In some cases, it may be necessary to define the target market as including some classes of consumers and excluding others.

    There’s a range of information that product issuers can use to help define the target market. The information available will depend on whether the product is new or existing, and whether the issuer distributes the product directly to consumers or whether distribution occurs through a third party. The following table provides examples of the information organisations may want to consider.

    ExperienceProduct designCustomer / risk information
    – Reported claims
    – Accepted claims
    – Declined / withdrawn claims
    – Paid claims
    – Lapse rates
    – Disputes + outcomes
    – Exclusions
    – Coverage limits
    – General and claims conditions
    – Deductibles
    – Premium charged
    – Eligibility
    – Asset insured
    – Location
    – Policyholder age
    – Employment status
    – Life stage
    – Objectives

    Product issuers should consider how customers’ needs change over time. For example, a comprehensive car insurance policy issued today on a 15-year-old Holden may no longer be appropriate, despite the product being fit for purpose when the car was new.

    Ultimately, it may be necessary to ask additional questions to ascertain whether a consumer is in the target market and to respond appropriately when customers volunteer relevant information. The law provides an exemption from personal advice obligations to determine whether or not a consumer is in the target market.

    Step 2: How do I monitor compliance with DDO requirements?

    Monitoring compliance with DDO requirements involves:

    • Monitoring whether the product reached consumers in the target market. If it has not reached the target market, why not?
    • Assessing how the product performed for consumers. That is, did the product meet consumers’ objectives and needs?

    A TMD must specify suitable review triggers, such as loss ratios consistently falling below a level that indicates diminishing value for the class of consumers. Monitoring performance against these review triggers will support product issuers in deciding whether changes to the product and/or distribution are required. It will allow businesses to make responsive, data-informed decisions that will ultimately benefit their bottom line, and ensure they are in tune with their customers and their concerns.

    Bundled products (such as home and contents cover) should also be considered separately …
    Assessing whether the product has reached the target market

    Provided the target market is well defined, it should be straightforward to assess whether a product has reached the target market, with ASIC emphasising the importance of measuring consumer experience by distribution channel. Bundled products (such as home and contents cover) should also be considered separately as the target market for the combined product is narrower than the market for the two separate products.

    ASIC’s regulatory guide states that issuers and distributors are not assessed as having failed to take reasonable steps because a consumer outside the target market acquires the product. Instead, ASIC’s concern is when a significant amount of distribution is occurring outside the target market, or the distribution is leading to consumer harm. In these circumstances, the product issuer needs to report the significant dealings to ASIC.

    Assessing consumer outcomes

    There’s considerable overlap between the metrics considered to determine the target market for existing products and those used to assess consumer outcomes. In the following table, we list a series of questions for product issuers to consider. Again, it is important to measure consumer outcomes by distribution channel. It may also be necessary to further segment the portfolio by risk or customer type – for example, customer tenure, geographic location of risk insured or customer age.

    QuestionsMetric
    Is a reasonable percentage of premium returned to the customer in the form of claim payments given the risk? That is, what is the financial value of the product to consumers?Gross loss ratio
    Are customers able to claim on the product when they need to? Consideration of rejected and withdrawn claims will inform an assessment of whether distribution practices and/or product terms and conditions are suitable.Rejected claims*

    Withdrawn claims
    Are products meeting consumers’ objectives? If not, where are the frictions?Volume and nature of complaints*

    Complaint outcomes
    Do consumers continue to see value in the product?Policy lapse rates

    Policy cancellation rates

    * We suggest monitoring rejected claims, withdrawn claims and complaints both in absolute terms and as a rate. Monitoring these metrics as a rate accounts for differences in product penetration and facilitates comparison between products.

    Product issuers need to consider what is a reasonable range for each metric, and the timeframes for measurement, to determine when action is required.

    Step 3: How do I capture the opportunities associated with meeting DDO requirements?

    There are several factors critical to moving DDO requirements away from a pure compliance exercise, towards a process that provides value. These are:

    • Senior management driving the process and engaging with stakeholders across underwriting and pricing, distribution, claims management, product management and actuarial.
    • Analysis of existing data to identify frictions in the provision of products. There are a range of data sources that can be analysed to improve design outside of meeting DDO requirements – customer enquiries data, for example.
    • A robust debate around suitable metrics to measure consumer outcomes and appropriate triggers. For example, setting a loss ratio trigger, which allows a product issuer to measure whether a reasonable percentage of premium is returned to the customer. This is a useful step in checking whether products are delivering value for consumers.
    • Reporting that is modified, as necessary, to embed consumer-focused metrics. It is important that consumer-focused metrics are tracked and shared across the various functions. Well set up monitoring should help issuers identify early trends or erosion in customer value and enable remedial action to be taken before it becomes a major issue. Just as the Board receives regular financial snapshots, we advocate for the Board to receive a regular summary of customer-focused metrics.

    A well-designed approach to meeting and monitoring DDO requirements should create value for product issuers and distributors. DDO provide product issuers with a great opportunity – to truly connect with their customers and their needs – this can only be a good thing for our industry.

  • Finding the fairness in ethical machine learning

    Are we headed for a dystopian sci-fi scenario where machines make all our decisions? In this second instalment of our two-part series, we discuss what it means to be fair in ethical machine learning, the surprising ways bias creeps in and why it’s so important to keep humans in the loop.

    Governments and businesses around the world are increasingly concerned with ensuring the fair and ethical use of machine learning. Most strikingly, in the United States, Facebook and Twitter have taken some actions around the recent presidential election to flag or remove disinformation – setting aside the question of whether their responses are sufficient and proportionate – notably Twitter’s removal of Donald Trump’s account.

    In Australia and New Zealand, we have also seen significant developments in government, including some relevant local developments:

    • In July 2020, New Zealand released its Algorithm Charter for Aotearoa New Zealand, which is essentially a call to action for public agencies to provide New Zealanders with the confidence that algorithms (and AI) are being used sensibly and ethically. We have discussed this in a separate article, Algorithm Charter for Aotearoa: six things to be doing now
    • In August 2020, the NSW Government effected the NSW AI Ethics Policy to guide the use of AI in the public sector. This defines mandatory ethical principles for the use of AI by NSW government agencies.

    The right strategy – where can organisations begin?

    For boards and management keen to address the issues of fairness in their data and technology, first, it’s critical to ensure human oversight to implementing ethical processes and a good governance structure. This will help establish appropriate people with responsibility for the fair application of machine learning, who can take action if problems arise. After that, there are four key considerations:

    • Fairness – Is your system as a whole fair?
    • Privacy and data ownership – Do you have permission to use the data for this particular task? Are you at risk of exposing private information?
    • Transparency – Do the users of the system and people affected by decisions or outcomes from the system understand why particular decisions are made? For high-stakes decisions, interpretable models may help (see our previous article, Interpretable machine learning: what to consider)
    • Accountability – How can those impacted by decisions understand and challenge them? Is there an avenue for redress? Who is responsible for the appropriate use of the algorithm?

    What do we mean by fairness anyway?

    Drawing on the parallels with Just War Theory (discussed in Part 1 of this series), fairness means that the overall action is justified and the way we implement it is ethical. Furthermore, many actions have positive and negative impacts so it’s important to also consider the principle of proportionality – are the benefits proportional to the possible harms?

    Thus, the first step is to decide if you are justified in using machine learning at all. Assuming you consider that you are, then the next step is to ensure that the process as a whole is implemented fairly. This in turn means you need to decide how fairness is defined in your particular context.

    Fair is not fair
    At a high level, we might say fairness means treating everyone equally. But if we drill down into details, what do we really mean by this? Let’s suppose we want to make a decision for a group of people and we want to treat men, women and non-binary fairly, that is, we don’t want to discriminate on the basis of gender. Suppose also that we plan to use a process to select some people for assistance to combat disadvantage suffered by them. In order to be fair for all, does fairness mean that:

    • We give exactly the same assistance to each individual? (Also known as formal equality)
    • We give differential levels of assistance so that, overall, everyone ends up in a similar position after the intervention? (Substantive equality)

    ‘At a high level, we might say fairness means treating everyone equally … ‘

    *Image source

    The answer is it depends on our aims and what we want to achieve. Fairness is very context dependent. What is right in one situation may not be the fair choice in another. For example, if a bank is considering whether to give someone a loan or not, then most of the time we’d probably view equality of treatment between genders as equitable. However, often gender is important and differential treatment might actually be a fairer option than identical treatment for all.

    A Swedish lesson
    In her book Invisible Women, Caroline Criado Perez gives a fascinating example where snow-clearing in Sweden was found to discriminate against women. A gender-equality initiative in 2011 meant that the town of Karlskoga in Sweden had to evaluate all policies for gender bias. No one had considered a gendered aspect to snow-clearing before in the town but once they did they realised that the usual snow-clearing policy (starting with major traffic arteries and working down to pedestrian and bicycle paths at the end) did affect men and women differently.

    The difference was in their travel patterns. Basically, men are more likely to drive in a twice-daily commute (pre-COVID-19, anyway), while women are more likely to make frequent shorter journeys closer to home and on foot, or by public transport, due to their greater share of unpaid care work of children and the elderly. The town councillors decided, on the basis of this difference, to prioritise clearing the footpaths and public transport areas – it’s much harder to walk through snow, especially with a buggy or pram than to drive through it.

    The story doesn’t end there, though, because what actually happened is the authorities ended up saving money – wintertime hospital admissions for injuries were dominated by pedestrians falling in icy conditions, so these were greatly reduced. While this isn’t an example of a machine learning decision process, it does illustrate some of the nuances in considering the question of fairness for different groups.

    ‘No one had considered a gendered aspect to snow-clearing before in the town … ‘

    Defining fairness at the outset
    Getting back to your problem at hand – how to decide if your system is fair: agreeing with stakeholders what exactly you mean by fairness for the particular process or system is a critical conversation to have early on. It’s important to cast the net far and wide to try to consider as many relevant issues as possible. The answer will depend on the context and usually there will need to be some trade-offs. Unsurprisingly, there are no hard-and-fast rules for determining what type of fairness to use, but these rules of thumb may be helpful:

    • If your process is assistive, then you should be concerned with errors of exclusion – you don’t want to exclude people from receiving help in a biased way
    • Conversely, if your process is punitive, then including people in error usually leads to greater harm.

    Fairness through unawareness
    Often people suggest ‘fairness through unawareness’ as a quick fix to removing bias. This is where the modelling process ignores the protected characteristic. There’s a subset of cases where this will work, but you still need to collect information on the characteristic so that you can review the process for fairness. However, a lot of the time it will fail – even if you have concluded that the fair thing is to treat genders equally, omitting gender from your model may not achieve this – other variables may act as proxies for gender.

    Of course, fairness through unawareness is unsuitable for cases where the ethical action is to take account of differences in gender. An example of this is in medical diagnoses, where diseases can present differently in males and females.

    But models are objective, aren’t they?

    One argument put forward in support of using algorithms for decision-making is that they are more objective and remove subjective human decisions. The trouble is that models are built on data – essentially, they identify patterns in the input data and use these patterns to make predictions about new data. While this can be valuable when the input data is ‘clean’ and free from bias, if we feed biased data in, then the model will perpetuate these biases.

    Unfortunately, we know the world has systemic biases – if algorithms are applied at scale, they can do significant amounts of harm. Cathy O’Neill provides many examples in her book, Weapons of Math Destruction, including the United States example we mentioned in Part 1, where a US magazine unintentionally altered the tertiary education environment in the 1980s.  The Swedish example, above, highlights the nuances of bias – often the presence of biases are not readily apparent.

    The problem with patterns in crime
    One of these concerns predictive policing machine learning tools, which are frequently used in the US. Some look at patterns of crime in an area and can be useful for preventing burglaries and car theft.

    However, if the systems are set up to include more minor or nuisance crimes, then fairness problems can creep in. These types of crimes generally go unreported if there are no police present to see them. While they can occur in all areas, they are usually more frequent in impoverished districts and this sets up a destructive feedback loop.

    More police are directed to these impoverished areas, in turn detecting more minor crime, which feeds into the model to direct even more police to the area. Those affected get caught up in the penal system, often for crimes that are ignored or undetected in less impoverished areas.

    A testing time ahead
    So while we live in an imperfect world, we need to consider biases in the data and structural barriers in society, and take action to address these. This isn’t easy – as illustrated by Ofqual’s algorithm for allocating A-level grades (high-school final exams) in the UK in 2020, following exam cancellation due to the COVID-19 pandemic. As discussed in our recent article When the algorithm fails to make the grade, it appears that a sincere effort was made to produce a fair process, which did tick many boxes. However, it fell short in significant areas and the backlash eventually led to the algorithm-awarded grades being scrapped.

    ‘Professor Lokke Moerel, of Tilburg University, the Netherlands, likened AI to the first cars … ‘

    Other concerns that go hand in hand with fairness

    Fairness tends to dominate discussions about ethical AI, but the other aspects of an ethical process are also important. For example, inappropriate use of data is a significant breach of trust and may possibly breach regulations as well.

    Transparency is critical because it allows you to understand why your models make certain predictions. Not only is this important for sense-checking of results but it’s also important when looking at fairness at an individual level. It can help you identify the source of unexpected results. In a wider sense, it also includes being very clear about what the algorithm can and can’t do and when algorithm results should be supplemented by human judgment.

    Finally, accountability is important to ensure a pathway is set out for decisions to be challenged and changed where appropriate. Dystopic fictional futures in which algorithms make all decisions should belong in science fiction – keeping humans with real agency in the loop is important. The reality is different, however, and not confined to fiction – we already see numerous examples of ‘the algorithm says so’. For instance, earlier this year, an African American man in Michigan was wrongfully arrested and held in a detention centre for nearly 30 hours after facial recognition technology incorrectly identified him as a suspect in a shoplifting case.

    We’ve spent ages coming up with the fairest process we could think of, so we don’t have to worry about it again. Right?

    Wrong. No matter how hard you try, there will be some undetected bias in the model, or some unanticipated unfairness or unintended harms. You’ll need to closely monitor the outcomes of your system and identify ways in which it can be improved. This could be anything from minor changes to going back to the drawing board.

    In a presentation to the 2020 European Actuarial Academy conference on data science and data ethics, Professor Lokke Moerel, of Tilburg University, the Netherlands, likened AI to the first cars, which didn’t have brakes or other safety features. It was only after people started driving them for a while when the need for brakes was realised. As time went on, we iterated over more and more safety features. The same too with machine learning and big data – we need to be aware when we deploy solutions that there will be failings and shortcomings, and we must be prepared to regularly address these and improve processes and systems in the interests of all those in our society, particularly the disadvantaged.

    A systemic approach to regulating machine learning and artificial intelligence

    The first-car analogy extends further. Professor Moerel also noted that the infrastructure to support the first cars, such as traffic systems and the wide use of tarmac roads, was not present but developed over time. Likewise, machine learning infrastructure is also missing, specifically, regulating the use of machine learning to ensure it’s used in an ethical and beneficial way, rather than used by a small number for their advantage at the significant disadvantage of the majority. It’s important we all continue to have this conversation together, and take action at individual, organisational, governmental and global levels to bring about a future where AI is used to help not hinder.