LONDON, July 19 (Reuters) - Elements of Friday's global
IT outage, which grounded planes and hit services from banking
to healthcare, have occurred before and until more contingencies
are built into networks, and organisations put better back-up
plans in place, it will happen again.
Friday's outage was caused by an update that U.S.
cybersecurity firm CrowdStrike ( CRWD ) pushed to its clients
early on Friday morning which conflicted with Microsoft's ( MSFT )
Windows operating system, rendering devices around the
world inoperable.
CrowdStrike ( CRWD ) has one of the largest shares of the highly
competitive cybersecurity market that provides such tools,
leading some industry analysts to question whether control over
such operationally critical software should remain in the hands
of just a handful of companies.
But the outage has also raised concerns among experts that
many organisations are not well-prepared to implement
contingency plans when a single point of failure such as an IT
system, or a piece of software within it, goes down.
At the same time there are also more solvable digital
disasters looming on the horizon, with perhaps the biggest
global IT challenge since the Millennium Bug, the "2038
Problem", just under 14 years away - and, this time, the world
is infinitely more dependent on computers.
"It's easy to jump at the idea that this is disastrous and
therefore suggest there must be a more diverse market and, in an
ideal world, that's what we'd have," said Ciaran Martin, former
head of Britain's National Cyber Security Centre (NCSC), part of
the country's GCHQ intelligence agency.
"We're actually good at managing the safety aspects of tech
when it comes to cars, trains, planes, and machines. What we're
bad at is then providing services," he added.
"Look at what happened to the London health system a few
weeks ago - they were hacked, and that led to loads of cancelled
operations, which is physically dangerous," he said, referring
to a recent ransomware incident which affected Britain's
National Health Service (NHS).
Organisations need to look around their IT systems, Martin
said, and ensure there are enough failsafes and redundancies in
those systems to stay operational in the event of an outage.
Friday's outage happened amid a perfect storm, with both
Microsoft ( MSFT ) and CrowdStrike ( CRWD ) owning huge shares of a market which
relies on both of their products.
"I'm sure the regulators globally are looking at this. There
is limited competition globally for operating systems, for
example, and also for the large scale cybersecurity products
like the ones CrowdStrike ( CRWD ) provides," said Nigel Phair, a
cybersecurity professor at Australia's Monash University.
Friday's outage hit airlines particularly hard, as many
scrambled to check in and board passengers who relied upon
digital tickets to fly. Some travellers posted photos on social
media of hand-written boarding cards provided by airline staff.
Others were only able to fly if they had printed out their
ticket.
"I think it's very important for organisations of all shapes
and sizes to really look at their risk management and look at an
all-hazards approach," Phair said.
EPOCHALYPSE NOW
Friday's outage will not be the last time the world is
reminded of its dependency on computers and IT products for
basic services to function. In about 14 years' time, the world
will be faced with a time-based computer issue similar to the
Millennium Bug called the "2038 Problem".
The Millennium Bug, or "Y2K" happened because early
computers saved expensive memory space by only counting the last
two digits of the year, meaning many systems were unable to
distinguish between the year 1900 and 2000, leading to critical
errors.
The cost to mitigate the problem in the years before 2000
ran up a global bill of hundreds of billions of dollars.
The 2038 problem, or "Epochalypse", which begins at 0314 GMT
on Jan. 19, 2038, is, in essence, the same problem.
Many computers count the passage of time by measuring the
number of seconds since midnight on Jan. 1, 1970, also known as
the "Epoch".
Those seconds are stored as a finite sequence of zeroes and
ones, or "bits" but for many computers, the number of bits that
can be stored reaches its maximum value in 2038.
"We currently have a situation where there's huge global
disruption, because we cannot cope administratively," said
Ciaran Martin, the former NCSC head.
"We can cope in terms of safety, but we can't cope in terms
of service provision when key networks go down".