Data Science

Artificial intelligence (AI)

Artificial Intelligence[ AI ] is truly a revolutionary feat of computer science, set to become a core component of all modern software over the coming years and decades. This presents a threat but also an opportunity. AI will be deployed to augment both defensive and offensive cyber operations. Additionally, new means of cyber attack will be invented to take advantage of the particular weaknesses of AI technology. Finally, the importance of data will be amplified by AI’s appetite for large amounts of training data, redefining how we must think about data protection. Prudent governance at the global level will be essential to ensure that this era-defining technology will bring about broadly shared safety and prosperity.

AI and Big Data

In general terms, AI refers to computational tools that are able to substitute for human intelligence in the performance of certain tasks. This technology is currently advancing at a breakneck pace, much like the exponential growth experienced by database technology in the late twentieth century. Databases have grown to become the core infrastructure that drives enterprise-level software. Similarly, most of the new value added from software over the coming decades is expected to be driven, at least in part, by AI.

Within the last decade, databases have evolved significantly in order to handle the new phenomenon dubbed “big data.” This refers to the unprecedented size and global scale of modern data sets, largely gathered from the computer systems that have come to mediate nearly every aspect of daily life. For instance, YouTube receives over 400 hours of video content each minute (Brouwer 2015).

AI and Cyber Security

Hardly a day passes without a news story about a high-profile data breach or a cyber attack costing millions of dollars in damages. Cyber losses are difficult to estimate, but the International Monetary Fund places them in the range of US$100–$250 billion annually for the global financial sector (Lagarde 2012). Furthermore, with the ever-growing pervasiveness of computers, mobile devices, servers and smart devices, the aggregate threat exposure grows each day. While the business and policy communities are still struggling to wrap their heads around the cyber realm’s newfound importance, the application of AI to cyber security is heralding even greater changes.

One of the essential purposes of AI is to automate tasks that previously would have required human intelligence. Cutting down on the labour resources an organization must employ to complete a project, or the time an individual must devote to routine tasks, enables tremendous gains in efficiency. For instance, chatbots can be used to field customer service questions, and medical assistant AI can be used to diagnose diseases based on patients’ symptoms.

In a simplified model of how AI could be applied to cyber defence, log lines of recorded activity from servers and network components can be labelledas “hostile” or “non-hostile,” and an AI system can be trained using this data set to classify future observations into one of those two classes. The system can then act as an automated sentinel, singling out unusual observations from the vast background noise of normal activity.

The New Value of Data

AI technology will alter the cyber security environment in yet another way as its hunger for data changes what kind of information constitutes a useful asset, transforming troves of information that would not previously have been of interest into tempting targets for hostile actors.

While some cyber attacks aim solely to disrupt, inflict damage or wreak havoc, many intend to capture strategic assets such as intellectual property. Increasingly, aggressors in cyberspace are playing a long-term game, looking to acquire data for purposes yet unknown. The ability of AI systems to make use of even innocuous data is giving rise to the tactic of “data hoovering” — harvesting whatever information one can and storing it for future strategic use, even if that use is not well defined at present.

A recent report from The New York Times illustrates an example of this strategy in action (Sanger et al. 2018). The report notes that the Chinese government has been implicated in the theft of personal data from more than 500 million customers of the Marriott hotel chain. Although commonly the chief concern regarding data breaches is the potential misuse of financial information, in this case the information could be used to track down suspected spies by examining travel habits, or to track and detain individuals to use them as bargaining chips in other matters.

Data and AI connect, unify and unlock both intangible and tangible assets; they shouldn’t be thought of as distinct. Quantity of data is becoming a key factor to success in business, national security and even, as the Cambridge Analytica scandal shows, politics. The Marriott incident shows that relatively ordinary information can now provide a strategic asset in the fields of intelligence and national defence, as AI can wring useful insights out of seemingly disparate sources of information. Therefore, this sort of bulk data will likely become a more common target for actors operating in this domain.

machine learning (ML)

 

MAchine learning

Classical machine learning is often categorized by how an algorithm learns to become more accurate in its predictions. There are two basic approaches: supervised learning and unsupervised learning. The type of algorithm a data scientist chooses to use is dependent upon what type of data they want to predict.

How supervised machine learning works

Supervised machine learning requires the data scientist to train the algorithm with both labeled inputs and desired outputs. Supervised learning algorithms are good for the following tasks:

  • Binary classification – dividing data into two categories.
  • Multi-class classification – choosing between more than two types of answers.
  • Regression modeling – predicting continuous values.
  • Ensembling – combining the predictions of multiple machine learning models to produce an accurate prediction.

How unsupervised machine learning works

Unsupervised ML algorithms do not require data to be labeled. They sift through unlabeled data to look for patterns that can be used to group data points into subsets. Unsupervised learning algorithms are good for the following tasks:

  • Clustering — splitting the dataset into groups based on similarity.
  • Anomaly detection — identifying unusual data points in a dataset.
  • Association mining — identifying sets of items in a dataset that frequently occur together.
  • Dimensionality Reduction — reducing the number of variables in a data set