Resources

The Mastermind Behind GPT-4 and the Future of AI | Ilya Sutskever

In this podcast episode, Ilya Sutskever, the co-founder and chief scientist at OpenAI, discusses his vision for the future of artificial intelligence (AI), including large language models like GPT-4. Sutskever starts by explaining the importance of AI research and how OpenAI is working to advance the field. He shares his views on the ethical considerations of AI development and the potential impact of AI on society. The conversation then moves on to large language models and their capabilities. Sutskever talks about the challenges of developing GPT-4 and the limitations of current models. He discusses the potential for large language models to generate a text that is indistinguishable from human writing and how this technology could be used in the future. Sutskever also shares his views on AI-aided democracy and how AI could help solve global problems such as climate change and poverty. He emphasises the importance of building AI systems that are transparent, ethical, and aligned with human values. Throughout the conversation, Sutskever provides insights into the current state of AI research, the challenges facing the field, and his vision for the future of AI. This podcast episode is a must-listen for anyone interested in the intersection of AI, language, and society. Timestamps: 00:04 Introduction of Craig Smith and Ilya Sutskever. 01:00 Sutskever's AI and consciousness interests. 02:30 Sutskever's start in machine learning with Hinton. 03:45 Realization about training large neural networks. 06:33 Convolutional neural network breakthroughs and imagenet. 08:36 Predicting the next thing for unsupervised learning. 10:24 Development of GPT-3 and scaling in deep learning. 11:42 Specific scaling in deep learning and potential discovery. 13:01 Small changes can have big impact. 13:46 Limits of large language models and lack of understanding. 14:32 Difficulty in discussing limits of language models. 15:13 Statistical regularities lead to better understanding of world. 16:33 Limitations of language models and hope for reinforcement learning. 17:52 Teaching neural nets through interaction with humans. 21:44 Multimodal understanding not necessary for language models. 25:28 Autoregressive transformers and high-dimensional distributions. 26:02 Autoregressive transformers work well on images. 27:09 Pixels represented like a string of text. 29:40 Large generative models learn compressed representations of real-world processes. 31:31 Human teachers needed to guide reinforcement learning process. 35:10 Opportunity to teach AI models more skills with less data. 39:57 Desirable to have democratic process for providing information. 41:15 Impossible to understand everything in complicated situations. Craig Smith Twitter: https://twitter.com/craigss Eye on A.I. Twitter: https://twitter.com/EyeOn_AI

AI, Machine Learning, Deep Learning and Generative AI Explained

Want to learn about AI agents and assistants? Register for Virtual Agents Day here → https://ibm.biz/BdaAVa Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdKSer Join Jeff Crume as he dives into the distinctions between Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Foundation Models and how these technologies have evolved over time. He also explores the latest advancements in Generative AI, including large language models, chatbots, and deepfakes - and clarifies common misconceptions, simplifies complex concepts, and discusses the impact these technologies have on various fields. AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM → https://ibm.biz/BdKSei

Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and Solutions

Speaker: Nizar Habash, an Associate Professor of Computer Science at New York University Abu Dhabi Presentation: https://www.slideshare.net/grammarly/grammarly-ainlp-club-8-arabic-natural-language-processing-challenges-and-solutions-nizar-habash Summary: The Arabic language presents a number of challenges to researchers and developers of language technologies. Arabic is both morphologically rich and highly ambiguous; and it has a number of dialects that vary widely amongst themselves and with Standard Arabic. The dialects have no official spelling standards, and spelling and grammar errors are common in unedited Standard Arabic. In this talk we present some of these challenges in detail and cover some of the ongoing efforts to address them with creative language technologies.

Steven Pinker: Linguistics as a Window to Understanding the Brain | Big Think

Burnistoun - Voice Recognition Lift

Burnistoun stars @iainconnell and @boldrobertflorence. Eleven.

GPT-3 vs Human Brain

GPT-3 has 175 billion parameters/synapses. Human brain has 100 trillion synapses. How much will it cost to train a language model the size of the human brain? REFERENCES: [1] GPT-3 paper: Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 [2] OpenAI's GPT-3 Language Model: A Technical Overview https://lambdalabs.com/blog/demystifying-gpt-3/ [3] Measuring the Algorithmic Efficiency of Neural Networks https://arxiv.org/abs/2005.04305

GPT-3: Language Models are Few-Shot Learners (Paper Explained)

#gpt3 #openai #gpt-3 How far can you go with ONLY language modeling? Can a large enough language model perform NLP task out of the box? OpenAI take on these and other questions by training a transformer that is an order of magnitude larger than anything that has ever been built before and the results are astounding. OUTLINE: 0:00 - Intro & Overview 1:20 - Language Models 2:45 - Language Modeling Datasets 3:20 - Model Size 5:35 - Transformer Models 7:25 - Fine Tuning 10:15 - In-Context Learning 17:15 - Start of Experimental Results 19:10 - Question Answering 23:10 - What I think is happening 28:50 - Translation 31:30 - Winograd Schemes 33:00 - Commonsense Reasoning 37:00 - Reading Comprehension 37:30 - SuperGLUE 40:40 - NLI 41:40 - Arithmetic Expressions 48:30 - Word Unscrambling 50:30 - SAT Analogies 52:10 - News Article Generation 58:10 - Made-up Words 1:01:10 - Training Set Contamination 1:03:10 - Task Examples https://arxiv.org/abs/2005.14165 https://github.com/openai/gpt-3 Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei Links: YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher

BERT Explained!

This video explains the BERT Transformer model! BERT restructures the self-supervised language modeling task on massive datasets like Wikipedia. Bi-directional prediction describes masking intermediate tokens and using tokens on the left and right of the mask for predicting what was masked. This video also explores the input and output representations and how this facilitates fine-tuning the BERT transformer! Links Mentioned in Video: The Illustrated Transformer: http://jalammar.github.io/illustrated-transformer/ Tokenizers: How Machines Read: https://blog.floydhub.com/tokenization-nlp/ SQuAD: https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/ BERT: https://arxiv.org/abs/1810.04805 Thanks for watching! Please Subscribe!

עמוס עוז על השפה העברית. דיון: דר לימור שריר

Arabic Influence on Modern Hebrew!!

This video is all about how the Arabic language has influenced Modern Hebrew! 🚩 Learn Hebrew and Arabic with HebrewPod101 ( http://bit.ly/HebrewPod ) and ArabicPod101 ( http://bit.ly/arabicpod101 ). (Full disclosure: if you sign up for a premium account, Langfocus receives a small referral fee. But the free account is great too!) Special thanks to Daniel Shakarov for his Hebrew audio samples, and Ahmed Souhad for his Arabic audio samples! 🚩 Support Langfocus on Patreon: http://patreon.com/langfocus Current Patrons include: Andres Resendez Borgia, Andrew Heckenberg, Anjo Barnes, Auguste Fields, Behnam Esfahbod, Bennett Seacrist, Brandon Gonzalez, Can Cetinyilmaz, Clark Roth, Fiona de Visser, Guillermo Jimenez, Jacob Madsen, John Moffat, Marcelo Loureiro, Matthew Etter, Michael Arbagi, Michael Cuomo, Nobbi Lampe-Strang, Patrick W., Rosalind Resnick, Ruben Sanchez Jr, Sebastian Langshaw, ShadowCrossZero, Victoria Goh, Vincent David, Yuko Sunda, Adam Powell, Adam Vanderpluym, Alberto del Angel, Alen, Alex Hanselka, Ali Muhammed Alshehri, Alvin Quiñones, Andrew Woods, Angeline Biot, Aous Mansouri, Ashley Dierolf, Atsushi Yoshida, Avital Levant, Bartosz Czarnotta, Brent Warner, Brian Begnoche, Brian Morton, Bruce Stark, Carl saloga, Charis T'Rukh, Chelsea Boudreau, Christian Langreiter, Christopher Lowell, David LeCount, Debbie Levitt, Diane Young, DickyBoa, divad, Divadrax, Don Ross, Donald Tilley, Edward Wilson, Eric Loewenthal, Erin Robinson Swink, Fabio Martini, fatimahl, Grace Wagner, Gus Polly, Hannes Egli, Harry Kek, Henri Saussure, Herr K, Ina Mwanda, Jack Jackson, James and Amanda Soderling, James Lillis, Jay Bernard, Jens Aksel Takle, JESUS FERNANDO MIRANDA BARBOSA, JK Nair, JL Bumgarner, Justin Faist, Kevin J. Baron, Klaw117, Konrad, Kristian Erickson, Krzysztof Dobrzanski, Laura Morland, Lee Dedmon, Leo Coyne, Leo Barudi, Lincoln Hutton, Lorraine Inez Lil, Luke Jensen, M.Aqeel Afzal, Mahmoud Hashemi, Margaret Langendorf, Maria Comninou, Mariana Bentancor, Mark, Mark Grigoleit, Mark Kemp, Markzipan, Maurice Chou, Merrick Bobb, Michael Regal, Mike Frysinger, mimichi, Mohammed A. Abahussain, Nicholas Gentry, Nicole Tovar, Oleksandr Ivanov, Oto Kohulák, Panot, Papp Roland, Patrick smith, Patriot Nurse, Paul Shutler, Pauline Pavon, Paulla Fetzek, Peter Andersson, Peter Nikitin, Peter Scollar, Pomax, Raymond Thomas, Renato Paroni de Castro, Robert Sheehan, Robert Williams, Roland Seuhs, Ronald Brady, Ryan Lanham, Saffo Papantonopoulou, Samuel Croes, Scott Irons, Scott Russell, Sergio Pascalin, Shoji AKAO, ShrrgDas, Sierra Rooney, Simon Blanchet, Simon G, Spartak Kagramanyan, Steeven Lapointe, Stefan Reichenberger, Steven Severance, Suzanne Jacobs, Theophagous, Thomas Chapel, Tomáš Pauliček, Tryggurhavn, veleum, William MacKenzie, William O Beeman, William Shields, yasmine jaafar, Yeshar Hadi, Éric Martin. Sources include: The Renaissance of Modern Hebrew and Modern Standard Arabic: Parallels and Differences in the Revival of Two Semitic Languages. Joshua Blau. 40-42. “Arabic Loanwords in Modern Hebrew". Haseeb Shehadeh. ENCYCLOPEDIA OF HEBREW LANGUAGE AND LINGUISTICS Volume 1 (A-F). 149-152. Rasmī or aslī?: Arabic’s impact on Israeli Hebrew. D Gershon Lewental, DGLnotes, 27 January 2012. http://dglnotes.com/notes/arabic-hebrew.htm Moroccan Arabic's Influence on Modern Hebrew. "Foreigncy" podcast, Oct. 14 2018. Guest: Dr. Jonas Sibony, professor of Modern Hebrew, University of Strasbourg. Arabic Influence: Modern Period. Roni Henkin. ENCYCLOPEDIA OF HEBREW LANGUAGE AND LINGUISTICS Volume 1 (A-F). 143-149. https://www.academia.edu/6747639/Arabic_influence_Modern_period.pdf. Eliezer Ben-Yehuda Is Turning in His Grave Over Israel’s Humiliation of Arabic. Seraj Assi. https://www.haaretz.com/opinion/.premium-eliezer-ben-yehuda-is-turning-in-his-grave-over-israels-humiliation-of-arabic-1.5472510 Music: "Time Illusionist" by Asher Fulero. The following images were used under Creative Commons Sharealike 3.0 license: https://en.wikipedia.org/wiki/Afroasiatic_languages#/media/File:Hamito-Semitic_languages.jpg. Author: Listorien, Anak 1. https://commons.wikimedia.org/wiki/Category:Ashkelon#/media/File:Ashqelon2011-2.jpg. Author: Oyoyoy Still images which include the above images are available for use under the same Creative Commons Sharealike 3.0 license.

The ARABIC Language (Its Amazing History and Features)

This video is all about the Arabic language, from its early origins on the Arabian peninsula, to its current status as the 5th most spoken language on Earth. I also examine a number of features of Arabic. ▶ Learn Arabic: http://bit.ly/arabicpod101 ◀ (Full disclosure: if you sign up for a paid membership, Langfocus receives a small referral fee.) Special thanks to Murjana Shabaneh and Mohammad Abd Al Qadr for the audio samples and feedback! 🔹🔷 Check out Langfocus on Patreon http://patreon.com/langfocus 🔷🔹 Current Patreon members include these fantastic people: Brandon Gonzalez, Виктор Павлов, Mark Thesing, Jiajun "Jeremy" Liu, иктор Павлов, Guillermo Jimenez, Sidney Frattini Junior, Bennett Seacrist, Ruben Sanchez, Michael Cuomo, Eric Garland, Brian Michalowski, Sebastian Langshaw, Vadim Sobolev, FRANCISCO, Mohammed A. Abahussain, Fred, UlasYesil, JL Bumgarner, Rob Hoskins, Thomas A. McCloud, Ian Smith, Maurice Chow, Matthew Cockburn, Raymond Thomas, Simon Blanchet, Ryan Marquardt, Sky Vied, Romain Paulus, Panot, Erik Edelmann, Bennet, James Zavaleta, Ulrike Baumann, Ian Martyn, Justin Faist, Jeff Miller, Stephen Lawson, Howard Stratton, George Greene, Panthea Madjidi, Nicholas Gentry, Sergios Tsakatikas, Bruno Filippi, Sergio Tsakatikas, Qarion, Pedro Flores, Raymond Thomas, Marco Antonio Barcellos Junior, David Beitler, Rick Gerritzen, Sailcat, Mark Kemp, Éric Martin, Leo Barudi, Piotr Chmielowski, Suzanne Jacobs, Johann Goergen, Darren Rennels, Caio Fernandes, Iddo Berger, Peter Nikitin, Brent Werner, Fiona de Visser, Carl Saloga, Edward Wilson, Kevin Law, David Lecount, Joshua Philgarlic, for their generous Patreon support. Video chapters: 00:00 Introduction 00:32 General Information about the Arabic Language 01:07 Varieties of Arabic 02:06 Arabic is Semitic language 02:22 Old Arabic 03:51 Classical Arabic 05:04 Neo-Arabic & Middle Arabic 06:02 Modern Arabic 06:47 Diglossia in Arabic 08:21 The Arabic script 09:24 Arabic phonology 10:30 Morphology in the Arabic language 11:36 Verbs in Arabic 13:05 Word order in Arabic 14:00 Cases in Arabic 15:05 Sentence breakdown 16:30 Final comments 17:22 The Question of the Day Music: You're free to use this song and monetize your video, but you must include the following in your video description: Ibn Al-Noor by Kevin MacLeod is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/) Source: http://incompetech.com/music/royalty-free/index.html?isrc=USUAN1100706 Artist: http://incompetech.com/ "Raw Deal" by Gunnar Olsen. "In Case You Forgot" by Otis McDonald. Drum beat from: https://www.youtube.com/watch?v=fVvWgpBHNL0 Images: "Arabic Speaking World" map courtesy of Keteracel at English Wikipedia. https://commons.wikimedia.org/wiki/File:Arabic_speaking_world.svg

The Bakery - האקדמיה ללשון העברית

Natural Language Processing

Natural Language Processing is a field of Artificial Intelligence dedicated to enabling computers to understand and communicate in human language. NLP is only a few decades old, but we've made significant progress in that time. I'll cover how its changed over the years, then show you how you can easily build an NLP app that can either classify or summarize text. This is incredibly powerful technology that anyone can freely use, I'll show you how to do it. Enjoy! Code for this video: https://github.com/llSourcell/bert-as-service Please Subscribe! And like. And comment. That's what keeps me going. Want more education? Connect with me here: Twitter: https://twitter.com/sirajraval Facebook: https://www.facebook.com/sirajology instagram: https://www.instagram.com/sirajraval More Learning resources: https://www.youtube.com/watch?v=0n95f-eqZdw http://mlexplained.com/2019/01/30/an-in-depth-tutorial-to-allennlp-from-basics-to-elmo-and-bert/ https://towardsdatascience.com/beyond-word-embeddings-part-2-word-vectors-nlp-modeling-from-bow-to-bert-4ebd4711d0ec https://gluon-nlp.mxnet.io/examples/sentence_embedding/bert.html Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ Join us at the School of AI: https://theschool.ai/ And please support me on Patreon: https://www.patreon.com/user?u=3191693 Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz

אברהם שלונסקי על "המחשב האלקטרוני"

אברהם שלונסקי, מהחשובים מהמשוררים בעברית בישראל, בתיעוד נדיר ובודד מ-1968. הוא מתראיין לאבשלום קור ומיכל גוברין ונותן את דעתו על המחשב האלקטרוני והאדם בעידן הטכנולוגי. את הראיון המלא ניתן לראות כאן: http://23tv.co.il/1487-he/Tachi.aspx

Consolidating and Exploring Open Textual Knowledge Prof. Ido Dagan, Bar Ilan University >> Click here

מבוא לשפה - עיבוד ממוחשב של שפה אנושית עם פרופסור עידו דגן With Spotify

START WIT NLP

Start with NLP

Recommended textbook, available online:
https://web.stanford.edu/~jurafsky/slp3/

It also provides great little introductions to many fields of linguistics before you hop into the computational part.

NLP Tutorials Part -I from Basics to Advance

https://www.analyticsvidhya.com/blog/2022/01/nlp-tutorials-part-i-from-basics-to-advance/

Hebrew NLP resources

Hebrew NLP Resources

https://github.com/NNLP-IL/Resources

מאגרי מידע ושת"פים אפשריים
https://docs.google.com/spreadsheets/d/1fGYKyA5Jf_KPCXPCpRWGfRzjDc6ALp9dgKnbIXqxM_Y/edit#gid=0

חוות דעת: שימושים בתכנים מוגנים בזכויות יוצרים לצורך למידת מכונה

https://www.gov.il/he/departments/legalInfo/machine-learning

Open source

Open Source

Github

NLP
https://github.com/topics/natural-language-processing

Speech

https://github.com/topics/speech

spaCy · Industrial-strength Natural Language Processing in Python
https://spacy.io/

Stanza – A Python NLP Package for Many Human Languages

Created by the Stanford NLP Group

https://stanfordnlp.github.io/stanza/a

Unsupervised

Large language model (LLM)

Open LLMs List
https://github.com/eugeneyan/open-llms

What’s before GPT-4? A deep dive into ChatGPT

https://medium.com/digital-sense-ai/whats-before-gpt-4-a-deep-dive-into-chatgpt-dfce9db49956

GPT-4 Training process

Like previous GPT models, the GPT-4 base model was trained to predict the next word in a document, and was trained using publicly available data (such as internet data) as well as data we’ve licensed. The data is a web-scale corpus of data including correct and incorrect solutions to math problems, weak and strong reasoning, self-contradictory and consistent statements, and representing a great variety of ideologies and ideas.

So when prompted with a question, the base model can respond in a wide variety of ways that might be far from a user’s intent. To align it with the user’s intent within guardrails, we fine-tune the model’s behavior using reinforcement learning with human feedback (RLHF).

Note that the model’s capabilities seem to come primarily from the pre-training process—RLHF does not improve exam performance (without active effort, it actually degrades it). But steering of the model comes from the post-training process—the base model requires prompt engineering to even know that it should answer the questions.

https://openai.com/research/gpt-4

BERT
https://github.com/google-research/bert

AlephBERT

https://github.com/OnlpLab/AlephBERT
https://arxiv.org/pdf/2104.04052.pdf

Multi-language Aspects
How Language-Neutral is Multilingual BERT?
https://arxiv.org/pdf/1911.03310.pdf

AraBERT: Transformer-based Model for Arabic Language Understanding
https://arxiv.org/pdf/2003.00104.pdf

ELMo
https://allennlp.org/elmo