A Digital Disrupter: 2024

Writing these words here I am wondering how soon the first web crawler will arrive pulling them into a Large Language Model (LLM) and then how soon they will be used to formulate new sentences, paragraphs and articles for anyone using ChatGPT or Bard. I think it is amazing that writing these words now is going to affect the output for some user of an AI tool shortly. I have included many relevant Wikipedia Links below to avoid me writing up the background should you want more detail since this post is examining a different perspective.

I suspect Bard will be the first to make use of my words written here. This is because Google own Bard and they will give its web crawler a priority in terms of being able to capture sites they, Google, own. More correctly Alphabet own who own Google. This here is being written on Blogspot which is owned by Google. Google may hold back the ChatGPT (Claim open but really Microsoft) and Llama ( Meta - Facebook) web crawlers from getting immediate access. This whole subject of allowing access to web crawlers to your websites is the subject of a new commercial and political battle most of it taking place behind the scenes. With me being in the Writers and Publishers domain it is a subject that I chose to follow closely since it is my content they are making use of without my authorisation or in some cases my legal copyright approval.

It should be noted that the market is being flooded with many AI generation tools with some having their own web crawlers or sharing with others. They are not just text generators but can produce a variety of digital outputs covering images, audio and video. Look out for DALL-E for image creation. It reminds me of when the World Wide Web first started with everything changing over night and you never knowing which one to view. I have used ChatGPT and achieved some impressive outcomes.

It should be noted that most of the owners of web crawlers, although allowing use of the AI tools running over their LLM’s, do not allow a more open access to the underlining data. But it appears that Meta (Facebook) with their LLM called Llama 2 are offering a different approach allowing for a more open access to these LLM captured data sets.

To conclude for now read this article from the Times Newspaper.

The article in the Times on the 22/01/24 titled “Google “traps” publishers in AI battle by Katie Prescott their Technology Business Editor with a copy below suitably copyright acknowledged should give you an insight to this current situation.

Copyright. The Times Newspaper.

Google ‘traps’ publishers in AI battle

Katie Prescott - Technology Business Editor

The New York Times has sued OpenAI for alleged copyright infringement, claiming the technology company used the newspaper’s information without permission.

Publishers are complaining that Google, because of its dominance in internet search, has them “between a rock and a hard place” over the use of their copyrighted output to power its artificial intelligence models.

This situation potentially gives Google an enormous advantage over its rivals, they claim, as businesses fear that blocking its AI search “crawlers” would mean they lose out on valuable traffic.

While most publishers, such as media organisations, have blocked OpenAI’s web crawler, a bot that sucks in their content to feed ChatGPT with information, they worry that barring Google’s equivalent, which supplies its Bard chatbot, would disadvantage them in the long term when it comes to making their information findable and accessible on traditional Google.

“We don’t want to do anything that results in a situation where we get less traffic in a world where Google combines AI and search,” one said, “so we’ve turned off the OpenAI crawler but we haven’t turned off the Google one.

They have us between a rock and a hard place.”

Towards the end of last year, Google said it would split its crawlers, so that publishers could choose whether to have their information scraped, or extracted, for its AI systems or merely its search engine. However, it has a new iteration, called search generative experience, or SGE, which is a hybrid of generative AI paragraphs and traditional search: this is what publishers fear will erase them from results pages should they block Google’s crawlers.

Owen Meredith, chief executive of the News Media Association, a British trade body, said: “Individual publishers inevitably will take a commercial view on whether to opt out or not, based on their individual business model. The challenge for many publishers will be the interdependency and gatekeeper role of a small number of Big Tech platforms across every part of their business, from discoverability to advertising to operating systems. Publishers may feel exposed about how Big Tech could react if they decide to opt out.”

Google says it is very aware of the importance of generative AI returning traffic to content-makers and argues that the new function will present more possibilities for people searching for information.

“As we develop LLM [large language model]-powered features, we’ll continue to prioritise experiences that send valuable traffic to the news ecosystem,” a spokeswoman said. “Our intent is for search generative experiences to serve as a jumping-off point for people to explore web content and, in fact, we are showing more links with SGE in search and links to a wider range of sources on the results page, creating new opportunities for content to be discovered.”

Generative AI burst into the public consciousness with the launch of ChatGPT in November 2022. Since then a handful of players, including OpenAI, backed by Microsoft, Google and Meta, have dominated the market, boasting the resources and computing power needed to build the large-language models that underpin the engines that can create everything from text to images in a human-like way.

Creative industries and the technology companies worldwide are clashing heads over the rights to the content used to create AI. In Britain, the Department for Science, Innovation and Technology is expected to make a ruling shortly. In addition, there are test cases under way in the courts. In the United States, The New York Times has sued OpenAI for alleged copyright infringement, claiming the technology company used the newspaper’s information to train its artificial intelligence models without permission or compensation.

In Britain, Getty Images is suing Stability AI in the High Court over copyright, claiming the latter had “unlawfully” scraped millions of images from its site.

Useful links.

Wikipedia Large Language Model

https://en.wikipedia.org/wiki/Large_language_model

Wikipedia Web Crawler

http://en.wikipedia.org/wiki/Web_crawler

Wikipedia ChatGPT

https://en.wikipedia.org/wiki/ChatGPT

Wikipedia Bard

https://en.wikipedia.org/wiki/Bard_(chatbot)

Meta Llama 2

https://ai.meta.com

Wikipedia DALL-E

https://en.wikipedia.org/wiki/DALL-E

The whole subject of Artificial Intelligence (AI) has been taken up by the media channels following the release of ChatGPT in November 2022 created by the tech start up OpenAI. By January 2023 Microsoft has invested $10 bn into the Company and quickly built GPT functionality into its Bing search engine with a promise of its direct integration into Office. By March Google responded with their launch of their AI-powered chatbot named Bard. My concern is the media has labelled the subject as Artificial Intelligence (AI) when in fact these are the products of Machine Learning (ML).

Artificial Intelligence is a much more complex subject that was first defined in the 1950’s. It was focussed upon the concept of a computer acting exactly like a human. Could you be tricked into thinking the computer you were conversing with was a human. It was viewed as computational psychology looking for a computer to emulate the richness and subtlety of our mental powers. Human intelligence is built upon cultural beliefs, individual ideas, interests, purposes, choice, self-reference and self-knowledge. Various hypotheses exist about how human mental processes use these building blocks to create what we define as intelligence. ChatGPT and Bard are only tools built upon the huge internet knowledge base and in no way reflect the complexity of Artificial Intelligence. Let us explore the historical background to the evolution of Artificial Intelligence (AI) and therefore see these new so called AI tools in their correct Machine Learning (ML) category being just being a minor a branch of AI.

Artificial Intelligence is about making a computer have intelligent behaviour like creativity, originality, autonomy and consciousness. These intelligent computers will be able to converse with humans in a natural language and understand speech and pictures. These will be computers that can learn, associate, make inferences, make decisions and otherwise behave in ways we have always considered the exclusive province of human reason. The significance of artificial intelligence was as a tool it could amplify human thought. Intelligent behaviour has attributes like deciding to search for a solution to a problem. Then in so doing applying a guessing approach to cut down the ideas that need to be searched. Followed by creating a solution then testing if it works. Then if it does not work trying something else. This is building up specialist knowledge based upon problem solving.

Artificial Intelligence to achieve its purpose needed to move forward on many fronts including natural language understanding, robotics, image and speech understanding, cognitive modelling and theorem proving. The classic books on Artificial Intelligence cover Three Volumes edited by Avron Barr and Edward A. Feigenbaun from Stanford University in 1981. These digitised in 2012 and added to the Internet Archive and they are freely available to read online or download.

The Handbook of Artificial Intelligence. Volume 1

https://archive.org/details/handbookofartific01barr/

The Handbook of Artificial Intelligence. Volume 2

https://archive.org/details/handbookofartific02barr/

The Handbook of Artificial Intelligence. Volume 3

https://archive.org/details/handbookofartific03cohe/

Edward A. Feigenbaum sometimes referred to as the father of “Expert Systems” is well worth a read see the Wikipedia entry below.

https://en.wikipedia.org/wiki/Edward_Feigenbaum

Having given you an insight into the true subject of Artificial Intelligence (AI) it should be noted it evolved many specialist branches all of which need to be fully developed before the we can say we have truly created AI. Many of these branches have evolved into sub-systems that stand independently from AI. They offer their own evolutionary pathways many of which will contribute parts to the creation of true AI. Many will go beyond “human capabilities” whereby they offer capabilities beyond that of a human. With the inevitable danger of them being an independent super intelligence that could come to harm us humans. No longer the talk just within the scifi (scientific fiction) community but the potential to become a reality.

In my career the 1980’s were full of media headlines outlining the so called “Fifth Generation” of computers on which Japan was become the lead.

First Generation Vacuum Tube Computers

Second Generation Transistors

Third Generation Integrated Circuits

Fourth Generation Large Scale Integrated Computers

Fifth Generation Artificial Intelligence (Japanese lead)

In reality Japan did not become the lead but it stirred America into responding to the threat. The Japanese approach was very Government focussed and centrally structured. The American response had some Government and Military initiatives, but it was once again a very commercially driven response dependent on the innovation of individuals supported by very free flowing venture capital. What the Japanese did do is define very precisely their strategic objectives in terms of achieving the goal of Artificial Intelligence. This certainly focussed American minds in particular that of Edward A. Feigenbaum who wrote a book on the subject, called The Fifth Generation, which became a justification for America to get its act together.

Not surprisingly the American approach was more pragmatic being focussed upon Knowledge bases and Expert systems. Something less theoretical but something offering immediate benefits. The capturing of “experts” knowledge into databases that could be viewed in various logical ways to derive “expert answers” became the practical side of implementations. Originally this was knowledge acquired from human experts. But the streaming of masses of data from other sources soon identified the power of this type of approach. Out of these evolved Machine Learning. (ML)

David Bannister (Banno)

Written in 2023

First Published on Blogger 2024

A Digital Disrupter

Wednesday, January 24, 2024

ZZ24002 Large Language Model (LLM) V01 240124

Wednesday, January 10, 2024

ZZ24001 Artificial Intelligence (AI) V01 100124