![Sam Witteveen](/img/default-banner.jpg)
- 169
- 4 007 535
Sam Witteveen
Приєднався 2 чер 2022
HI my name is Sam Witteveen, I have worked with Deep Learning for 9 years and with Transformers and LLM for 5+ years. I was appointed a Google Developer Expert for Machine Learning in 2017 and I currently work on LLMs and and since earlier in 2023 on Autonomous Agents.
Florence 2 - The Best Small VLM Out There?
There is a new VLM on the scene and it comes with a dataset of 5Billion labels. The new model can do a variety of old world tasks like bounding boxes and segmentation along with newer LLM style captioning etc.
Paper: arxiv.org/pdf/2311.06242
HF Spaces Demo: huggingface.co/spaces/gokaygokay/Florence-2
Colab : drp.li/fGyMm
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨💻Github:
github.com/samwit/langchain-tutorials (updated)
github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:13 Florence-2 Paper
02:19 Florence - 2 Architecture
03:20 Florence - 2 Detailed Image Captioning
03:41 Florence - 2 Visual Grounding
04:09 Florence - 2 Dense Region Caption
04:24 Florence - 2 Open Vocab Detection
06:01 Hugging Face Spaces Demo
10:41 Colab Florence - 2 Large Sample Usage
Paper: arxiv.org/pdf/2311.06242
HF Spaces Demo: huggingface.co/spaces/gokaygokay/Florence-2
Colab : drp.li/fGyMm
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨💻Github:
github.com/samwit/langchain-tutorials (updated)
github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:13 Florence-2 Paper
02:19 Florence - 2 Architecture
03:20 Florence - 2 Detailed Image Captioning
03:41 Florence - 2 Visual Grounding
04:09 Florence - 2 Dense Region Caption
04:24 Florence - 2 Open Vocab Detection
06:01 Hugging Face Spaces Demo
10:41 Colab Florence - 2 Large Sample Usage
Переглядів: 7 729
Відео
Claude 3.5 beats GPT4-o !!
Переглядів 13 тис.12 годин тому
In this video I examine Anthropic's latest version of theirClaude model Sonnet 3.5. I look at what the model can do and their new UI system called Artifacts. Blog: www.anthropic.com/news/claude-3-5-sonnet 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ...
How to save money with Gemini Context Caching
Переглядів 5 тис.15 годин тому
Context Caching is a great to get your Gemini calls to cost less and be faster for many people Colab : drp.li/L7IgU 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:14 Google Developers Tweet 01:41 Context Caching 04:03 Demo
Mesop - Google's New UI Maker
Переглядів 48 тис.20 годин тому
Colab Getting Started: drp.li/l1j9i Colab LangChain Groq: drp.li/k0huj GitHub: github.com/google/mesop Documentation: google.github.io/mesop/ 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 01:24 Mesop Website 02:34 Mesop Demo...
Nemotron-4 340B - Need to Make a LLM Dataset?
Переглядів 9 тис.День тому
In this video, I talk about the new Nemotron model from Nvidia and how it goes beyond just one video to be a whole family of models that allows you to make endless amounts of free synthetic data to train your own language models Blog: blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/ Tech Report: research.nvidia.com/publication/2024-06_nemotron-4-340b Testing the model: c...
ChatTTS - Conversational TTS Step by Step
Переглядів 6 тис.День тому
Lets take a look at the new conversational TTS that has come out from 2Noise called ChatTTS and how you can sample speakers and add in voice effects to create high quality Site: chattts.com/en Colab : drp.li/GfO6B 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-t...
Qwen 2 - For Reasoning or Creativity?
Переглядів 5 тис.14 днів тому
In this video I go through the new releases from Qwen family of models and look at where they excel and where perhaps they aren't as good as other models out there. Colab: drp.li/ADevp 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00...
Testing Microsoft's New VLM - Phi-3 Vision
Переглядів 11 тис.14 днів тому
In this video I go through the new Phi-3 Vision model and put it through it's paces to see what it can and can't do. Colab : drp.li/L8iFS HF: huggingface.co/microsoft/Phi-3-vision-128k-instruct 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stam...
5 Problems Getting LLM Agents into Production
Переглядів 11 тис.21 день тому
In this video I discuss 5 common problems in building LLM Agents for production 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:58 Reliability 02:46 Excessive Loops 04:36 Tools 07:59 Self-checking 09:22 Lack of Explainabili...
Google's RAG Experiment - NotebookLM
Переглядів 14 тис.28 днів тому
notebooklm.google.com/ Blog Launch: blog.google/technology/ai/notebooklm-new-features-availability/ 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro to NotebookLM 01:09 Google Blog post: Introduction to NotebookLM 01:30 Google...
Mastering Google's VLM PaliGemma: Tips And Tricks For Success and Fine Tuning
Переглядів 9 тис.Місяць тому
Colab (code) Inference : drp.li/GVIjV Colab (code) Fine Tuning : drp.li/I0w8d HF Blog: huggingface.co/blog/paligemma HF Spaces: huggingface.co/spaces/big-vision/paligemma-hf Models : huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langcha...
Mistral's new 7B Model with Native Function Calling
Переглядів 15 тис.Місяць тому
Colab Code - drp.li/K98Z7 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:54 Mistral 7B-V0.3 Benchmarks 01:26 What's beenadded to Mistral 7B-v0.3? 01:29 Hugging Face: Mistral AI 01:42 Code Time 02:12 Running Mistral 7B-v03...
Google I/O for Devs - TPUs, Gemma & GenKit
Переглядів 2,8 тис.Місяць тому
TPUv6 - cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus Gemma Updates: developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/ GenKit: firebase.google.com/docs/genkit 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutor...
Google is Finally Doing Agents
Переглядів 12 тис.Місяць тому
In this video I look at Google's new takes on Agents, how they are different than what has gone before and what we can take away from as developers. 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:07 Agents Cloud Next 01:1...
How Google is Expanding the Gemini Era
Переглядів 4,2 тис.Місяць тому
In this first of 3 vids covering Google I/O 2024 I go through the new announcements around Gemini and products using it. 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 01:24 Gemini Era 02:07 Gemini is in every Google Product...
Advanced Colab - How to go Beyond the Basics
Переглядів 4,1 тис.Місяць тому
Advanced Colab - How to go Beyond the Basics
New Summarization via In Context Learning with a New Class of Models
Переглядів 9 тис.Місяць тому
New Summarization via In Context Learning with a New Class of Models
Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3
Переглядів 33 тис.Місяць тому
Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3
Creating an AI Agent with LangGraph Llama 3 & Groq
Переглядів 39 тис.2 місяці тому
Creating an AI Agent with LangGraph Llama 3 & Groq
Llama3 + CrewAI + Groq = Email AI Agent
Переглядів 53 тис.2 місяці тому
Llama3 CrewAI Groq = Email AI Agent
Unlock The Gemini 1.5 Pro API (+ File API )
Переглядів 10 тис.2 місяці тому
Unlock The Gemini 1.5 Pro API ( File API )
Colab 101: Your Ultimate Beginner's Guide!
Переглядів 4,4 тис.2 місяці тому
Colab 101: Your Ultimate Beginner's Guide!
Discover What's New In Gemma 1.1 Update: New 2B & 7B Instruction Tuned models
Переглядів 7 тис.2 місяці тому
Discover What's New In Gemma 1.1 Update: New 2B & 7B Instruction Tuned models
Master Claude 3 Haiku - The Crash Course!
Переглядів 19 тис.2 місяці тому
Master Claude 3 Haiku - The Crash Course!
Master CrewAI: Your Ultimate Beginner's Guide!
Переглядів 63 тис.3 місяці тому
Master CrewAI: Your Ultimate Beginner's Guide!
This is what people should call "small", anything below 1B! Thanks for your video. By the way, I played around with the quantized version, the result is unbelievably good! I shared a post on Twitter and mentioned you and shared the Colab. Take a look at it. I tried 8 bits and 4 bits. It's odd how 4 bits is almost the same as the base model!
I'd love to know how to do that with openwebui and a local model in a single GPU. Do we need to use FAISS or what RAG?
can i use it with vertex ai?
how to use it in Vertex-AI ?
Any idea how this works? I am trying to work out whether to use an index of relevant context or use the context cache feature. It seems like the details are a closely guarded secret which would mean the only way for me to decide between the two would be to test both lots. The use cases seem to be very similar Option 1 - Give google a bunch of context, hope that it's good and then run queries against it Option 2 - index my context and add information as needed using RAG The RAG approach would lead to more tokens being used but at least I know how it works so can set my expectations for how it works. The google approach would be cheaper but I don't know how the context has been processed. I cant intentionally format my data for optimal performance.
Thank you!
I'm enthusiastic about these smaller models. Thanks for covering this!
great video - thanks!
Eh tried "OCR with regions" using microsoft/Florence-2-large on a Firefox window, got all menus and all tabs grouped in a single region, with the captions OCRed with a lot of spelling errors. It's nice they released it, but it's not really useful as a screen reader out-of-the-box (so results similar to Tesseract). It could perhaps do better after finetuning...
Locate unmasked person in the image. Call corona police to arrest suspect.
It would be great if you can show a finetuning example!
Great, yes, fine tune would be very interesting.
I think fine-tuning for OCR would be a good demo. OCR in the real world with images of documents is much harder than OCR on electronic documents so would be cool to see how a small model like this does as an alternative to Claude/GPT4.
I tried the OCR and OCR with region on images converted (no scanned) from PDF pages. Nothing fancy, standard text with some titles, sections, lists... it is absolutely unusable. When it detects something, it usually got it right, but it could only see around 25% of the text.
Please do fine-tuning for Object detection
Thanks, Sam! I always appreciate your videos. I would love your take on how Florence-2 compare with Apple's 4M-21.
Where is the dataset? I couldn't find the release
It's also good at OCR for hand written documents
Would be interested on how much memory is required to run these models. they seem pretty small even unquantized. Maybe I will try it later on my 8GB M1 Mini. One thing I am curious about: at 3:38 , the description for the image is wrong in ways that seem odd. The title is described as being on top with the "20 Years of ..." underneath and Ron's tie is described as red and hair blonde. I wonder if this is just vagaries of the model (placement data would be strange) or over reliance on training data. Or a straight up mistake in 'creating' the paper (which would probably be the most disturbing😉).
Thanks Sam!! Please keep up the great work...
awesome, thanks
Thanks for the great content. A video going through the fine-tuning process on this one would be amazing. I am not sure how this could scale to a video implementation (probably passing a frame each time).
I also would love a video/notebook for a Florence 2 fine tune
Wow, those models are tiny. Have some engagement!
It is important to remember that there is a cost of $4.50 / 1 million tokens per hour (storage)
Hi Sam, thanks for the video. What do you think about how does it compare with Phi3-V? My take is that this is more raw and better for fine tuning, do you also think so?
this is completely better and more advanced than phi 3 v crap image detection
Thanks for the information this is great. Can i fine tune it for certain specific images like few short learning. Can you put a tutorial for the same it will be great full.
Hi Sam. Thank you for the videos. I've been playing around with some of the smaller vision models and trying to implement batched inferencing with little success. If you were trying to accomplish running multiple VQA style questions against the same image quickly, how would you go about that goal? Is batching even in the right direction I should be looking?
We request you to do fune tuning on object detection. Because, all llms are useful generating text oupit only. Thanks in advance
Thanks for your work on sharing this information. Much easier to watch your content than keep my ear to the ground all day trying to keep up. Much appreciated, sir.
I wonder how much performance would be affected when something so distilled then gets quantized? Also, it seems amazing that it can handle segmentation for an unspecified set size! With Phi3 Vision you would need to provide a token to represent, say, each giraffe you want to identify.
quantization is a good question! I would expect it to suffer more than a big model. Might give it a test tomorrow.
Vqa tutorial would be nice!
Thank you - it looks interesting:)
I did try it, it is good and works for me, different from Microsoft edge TTS.
Thank you. This is saving me a loooooooooot of time
Not taking audio in is bizarre. any ideas why not?
this video is quite old now. Current version should be able to handle Audio now
can a retrieval QA chain work with the memory function?
can a retrieval chain work with memory function? I have been trying that for couple of days, but it doesn't work.
The only issue is that claude 3.5 doesnt have enough use without the pro plan to be good.
how many responses are you getting and how many do you think would be a fair amount?
@@samwitteveenai Well honestly I think they shouldnt put a limit at all to conversation length. Just, like how OpenAI does it, make it so you can only use a certain number of messages an hour. Ill take the limit for that, but to have the conversation completely cut short at any point makes the LLM unusable for any tasks other than testing. At the same time if it gets people to sign up for the pro plan, that's business for you. Im not entitled to have the product for free, I get that, but im just saying its a demo in this state.
The thing I'm worried about, is that 3.5 is extremley censored, like wwwwwwaaaay to much.
Thank god possibly a standardized way to develop LLM Tool-Use 🎉
Seems unnecessary
Hi Sam. THX, I am a newbee. I don´t understand why I have to use google Colab?. Is there a difference if i use the AI Anthropic Opus directly? The Output is the same or not? I wil get the prompt with variables in google colab or direct in A. Opus.
you can use it directly as long as you copy the prompt over fully etc. Colab is showing how to do it through the API
@@samwitteveenai o direct would be the similar output. Except that it's free. Because via Colab I would have to pay for the API. Did I understand correctly?
Please provide any alternative solution for using this with open source models
This kind of thing works with the Llama3 models often just need to play with the prompts a bit
Why people dont give Dash the love it deserves. Such powerful python UI maker
I used it for a while about 5 years ago, but abandoned it for streamlit. It was good for dashboards but not interactive chat etc back then. Haven't looked at it in a long time. Have they added much?
Great video. Thanks
Very useful. Thanks
Unuseable due to ridiculous message limits even on paid plan. Misleading
what do you feel would be a fair limit?
@@samwitteveenai well for 20$ how about unlimited? Make it 30… whatever…
What about Taipy
Taipei?
Sorry it’s Taipy
This is super useful! ❤
i actually switched it it as my primary model, it's so much better
Claude 3.5 sonnet is just so good at coding! Something GPT-4o doesn't understand it gets it with minimal examples whereas GPT-4o is the opposite.