Sam Witteveen
Sam Witteveen
  • 169
  • 4 007 535
Florence 2 - The Best Small VLM Out There?
There is a new VLM on the scene and it comes with a dataset of 5Billion labels. The new model can do a variety of old world tasks like bounding boxes and segmentation along with newer LLM style captioning etc.
Paper: arxiv.org/pdf/2311.06242
HF Spaces Demo: huggingface.co/spaces/gokaygokay/Florence-2
Colab : drp.li/fGyMm
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨‍💻Github:
github.com/samwit/langchain-tutorials (updated)
github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:13 Florence-2 Paper
02:19 Florence - 2 Architecture
03:20 Florence - 2 Detailed Image Captioning
03:41 Florence - 2 Visual Grounding
04:09 Florence - 2 Dense Region Caption
04:24 Florence - 2 Open Vocab Detection
06:01 Hugging Face Spaces Demo
10:41 Colab Florence - 2 Large Sample Usage
Переглядів: 7 729

Відео

Claude 3.5 beats GPT4-o !!
Переглядів 13 тис.12 годин тому
In this video I examine Anthropic's latest version of theirClaude model Sonnet 3.5. I look at what the model can do and their new UI system called Artifacts. Blog: www.anthropic.com/news/claude-3-5-sonnet 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ...
How to save money with Gemini Context Caching
Переглядів 5 тис.15 годин тому
Context Caching is a great to get your Gemini calls to cost less and be faster for many people Colab : drp.li/L7IgU 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:14 Google Developers Tweet 01:41 Context Caching 04:03 Demo
Mesop - Google's New UI Maker
Переглядів 48 тис.20 годин тому
Colab Getting Started: drp.li/l1j9i Colab LangChain Groq: drp.li/k0huj GitHub: github.com/google/mesop Documentation: google.github.io/mesop/ 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 01:24 Mesop Website 02:34 Mesop Demo...
Nemotron-4 340B - Need to Make a LLM Dataset?
Переглядів 9 тис.День тому
In this video, I talk about the new Nemotron model from Nvidia and how it goes beyond just one video to be a whole family of models that allows you to make endless amounts of free synthetic data to train your own language models Blog: blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/ Tech Report: research.nvidia.com/publication/2024-06_nemotron-4-340b Testing the model: c...
ChatTTS - Conversational TTS Step by Step
Переглядів 6 тис.День тому
Lets take a look at the new conversational TTS that has come out from 2Noise called ChatTTS and how you can sample speakers and add in voice effects to create high quality Site: chattts.com/en Colab : drp.li/GfO6B 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-t...
Qwen 2 - For Reasoning or Creativity?
Переглядів 5 тис.14 днів тому
In this video I go through the new releases from Qwen family of models and look at where they excel and where perhaps they aren't as good as other models out there. Colab: drp.li/ADevp 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00...
Testing Microsoft's New VLM - Phi-3 Vision
Переглядів 11 тис.14 днів тому
In this video I go through the new Phi-3 Vision model and put it through it's paces to see what it can and can't do. Colab : drp.li/L8iFS HF: huggingface.co/microsoft/Phi-3-vision-128k-instruct 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stam...
5 Problems Getting LLM Agents into Production
Переглядів 11 тис.21 день тому
In this video I discuss 5 common problems in building LLM Agents for production 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:58 Reliability 02:46 Excessive Loops 04:36 Tools 07:59 Self-checking 09:22 Lack of Explainabili...
Google's RAG Experiment - NotebookLM
Переглядів 14 тис.28 днів тому
notebooklm.google.com/ Blog Launch: blog.google/technology/ai/notebooklm-new-features-availability/ 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro to NotebookLM 01:09 Google Blog post: Introduction to NotebookLM 01:30 Google...
Mastering Google's VLM PaliGemma: Tips And Tricks For Success and Fine Tuning
Переглядів 9 тис.Місяць тому
Colab (code) Inference : drp.li/GVIjV Colab (code) Fine Tuning : drp.li/I0w8d HF Blog: huggingface.co/blog/paligemma HF Spaces: huggingface.co/spaces/big-vision/paligemma-hf Models : huggingface.co/collections/google/paligemma-release-6643a9ffbf57de2ae0448dda 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langcha...
Mistral's new 7B Model with Native Function Calling
Переглядів 15 тис.Місяць тому
Colab Code - drp.li/K98Z7 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:54 Mistral 7B-V0.3 Benchmarks 01:26 What's beenadded to Mistral 7B-v0.3? 01:29 Hugging Face: Mistral AI 01:42 Code Time 02:12 Running Mistral 7B-v03...
Google I/O for Devs - TPUs, Gemma & GenKit
Переглядів 2,8 тис.Місяць тому
TPUv6 - cloud.google.com/blog/products/compute/introducing-trillium-6th-gen-tpus Gemma Updates: developers.googleblog.com/en/gemma-family-and-toolkit-expansion-io-2024/ GenKit: firebase.google.com/docs/genkit 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutor...
Google is Finally Doing Agents
Переглядів 12 тис.Місяць тому
In this video I look at Google's new takes on Agents, how they are different than what has gone before and what we can take away from as developers. 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:07 Agents Cloud Next 01:1...
How Google is Expanding the Gemini Era
Переглядів 4,2 тис.Місяць тому
In this first of 3 vids covering Google I/O 2024 I go through the new announcements around Gemini and products using it. 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: drp.li/dIMes 👨‍💻Github: github.com/samwit/langchain-tutorials (updated) git hub.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 01:24 Gemini Era 02:07 Gemini is in every Google Product...
GPT-4o: What They Didn't Say!
Переглядів 31 тис.Місяць тому
GPT-4o: What They Didn't Say!
Advanced Colab - How to go Beyond the Basics
Переглядів 4,1 тис.Місяць тому
Advanced Colab - How to go Beyond the Basics
New Summarization via In Context Learning with a New Class of Models
Переглядів 9 тис.Місяць тому
New Summarization via In Context Learning with a New Class of Models
Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3
Переглядів 33 тис.Місяць тому
Function Calling with Local Models & LangChain - Ollama, Llama3 & Phi-3
Adding RAG to LangGraph Agents
Переглядів 10 тис.Місяць тому
Adding RAG to LangGraph Agents
Creating an AI Agent with LangGraph Llama 3 & Groq
Переглядів 39 тис.2 місяці тому
Creating an AI Agent with LangGraph Llama 3 & Groq
Llama3 + CrewAI + Groq = Email AI Agent
Переглядів 53 тис.2 місяці тому
Llama3 CrewAI Groq = Email AI Agent
Llama 3 - 8B & 70B Deep Dive
Переглядів 34 тис.2 місяці тому
Llama 3 - 8B & 70B Deep Dive
Unlock The Gemini 1.5 Pro API (+ File API )
Переглядів 10 тис.2 місяці тому
Unlock The Gemini 1.5 Pro API ( File API )
Colab 101: Your Ultimate Beginner's Guide!
Переглядів 4,4 тис.2 місяці тому
Colab 101: Your Ultimate Beginner's Guide!
Discover What's New In Gemma 1.1 Update: New 2B & 7B Instruction Tuned models
Переглядів 7 тис.2 місяці тому
Discover What's New In Gemma 1.1 Update: New 2B & 7B Instruction Tuned models
CrewAI + Claude 3 Haiku
Переглядів 9 тис.2 місяці тому
CrewAI Claude 3 Haiku
CrewAI - Building a Custom Crew
Переглядів 15 тис.2 місяці тому
CrewAI - Building a Custom Crew
Master Claude 3 Haiku - The Crash Course!
Переглядів 19 тис.2 місяці тому
Master Claude 3 Haiku - The Crash Course!
Master CrewAI: Your Ultimate Beginner's Guide!
Переглядів 63 тис.3 місяці тому
Master CrewAI: Your Ultimate Beginner's Guide!

КОМЕНТАРІ

  • @unclecode
    @unclecode 2 години тому

    This is what people should call "small", anything below 1B! Thanks for your video. By the way, I played around with the quantized version, the result is unbelievably good! I shared a post on Twitter and mentioned you and shared the Colab. Take a look at it. I tried 8 bits and 4 bits. It's odd how 4 bits is almost the same as the base model!

  • @Maisonier
    @Maisonier 6 годин тому

    I'd love to know how to do that with openwebui and a local model in a single GPU. Do we need to use FAISS or what RAG?

  • @RD-learning-today
    @RD-learning-today 11 годин тому

    can i use it with vertex ai?

  • @RD-learning-today
    @RD-learning-today 12 годин тому

    how to use it in Vertex-AI ?

  • @matty-oz6yd
    @matty-oz6yd 12 годин тому

    Any idea how this works? I am trying to work out whether to use an index of relevant context or use the context cache feature. It seems like the details are a closely guarded secret which would mean the only way for me to decide between the two would be to test both lots. The use cases seem to be very similar Option 1 - Give google a bunch of context, hope that it's good and then run queries against it Option 2 - index my context and add information as needed using RAG The RAG approach would lead to more tokens being used but at least I know how it works so can set my expectations for how it works. The google approach would be cheaper but I don't know how the context has been processed. I cant intentionally format my data for optimal performance.

  • @realCleanK
    @realCleanK 15 годин тому

    Thank you!

  • @jeremybristol4374
    @jeremybristol4374 15 годин тому

    I'm enthusiastic about these smaller models. Thanks for covering this!

  • @ylazerson
    @ylazerson 15 годин тому

    great video - thanks!

  • @clray123
    @clray123 15 годин тому

    Eh tried "OCR with regions" using microsoft/Florence-2-large on a Firefox window, got all menus and all tabs grouped in a single region, with the captions OCRed with a lot of spelling errors. It's nice they released it, but it's not really useful as a screen reader out-of-the-box (so results similar to Tesseract). It could perhaps do better after finetuning...

  • @clray123
    @clray123 15 годин тому

    Locate unmasked person in the image. Call corona police to arrest suspect.

  • @SaiManojPrakhya-mp4oe
    @SaiManojPrakhya-mp4oe 17 годин тому

    It would be great if you can show a finetuning example!

  • @aa-xn5hc
    @aa-xn5hc 18 годин тому

    Great, yes, fine tune would be very interesting.

  • @ariramkilowan8051
    @ariramkilowan8051 19 годин тому

    I think fine-tuning for OCR would be a good demo. OCR in the real world with images of documents is much harder than OCR on electronic documents so would be cool to see how a small model like this does as an alternative to Claude/GPT4.

    • @MH-ke2wi
      @MH-ke2wi 2 години тому

      I tried the OCR and OCR with region on images converted (no scanned) from PDF pages. Nothing fancy, standard text with some titles, sections, lists... it is absolutely unusable. When it detects something, it usually got it right, but it could only see around 25% of the text.

  • @JustEmbraceTheChallenge
    @JustEmbraceTheChallenge 23 години тому

    Please do fine-tuning for Object detection

  • @jefframpe5075
    @jefframpe5075 23 години тому

    Thanks, Sam! I always appreciate your videos. I would love your take on how Florence-2 compare with Apple's 4M-21.

  • @SinanAkkoyun
    @SinanAkkoyun День тому

    Where is the dataset? I couldn't find the release

  • @IsxaaqAcademy
    @IsxaaqAcademy День тому

    It's also good at OCR for hand written documents

  • @toadlguy
    @toadlguy День тому

    Would be interested on how much memory is required to run these models. they seem pretty small even unquantized. Maybe I will try it later on my 8GB M1 Mini. One thing I am curious about: at 3:38 , the description for the image is wrong in ways that seem odd. The title is described as being on top with the "20 Years of ..." underneath and Ron's tie is described as red and hair blonde. I wonder if this is just vagaries of the model (placement data would be strange) or over reliance on training data. Or a straight up mistake in 'creating' the paper (which would probably be the most disturbing😉).

  • @IanScrivener
    @IanScrivener День тому

    Thanks Sam!! Please keep up the great work...

  • @ALEXPREMIUMGAME
    @ALEXPREMIUMGAME День тому

    awesome, thanks

  • @danielmz99
    @danielmz99 День тому

    Thanks for the great content. A video going through the fine-tuning process on this one would be amazing. I am not sure how this could scale to a video implementation (probably passing a frame each time).

    • @coolmcdude
      @coolmcdude 8 годин тому

      I also would love a video/notebook for a Florence 2 fine tune

  • @JonathanYankovich
    @JonathanYankovich День тому

    Wow, those models are tiny. Have some engagement!

  • @eliaspereirah
    @eliaspereirah День тому

    It is important to remember that there is a cost of $4.50 / 1 million tokens per hour (storage)

  • @AbhishekKotecha
    @AbhishekKotecha День тому

    Hi Sam, thanks for the video. What do you think about how does it compare with Phi3-V? My take is that this is more raw and better for fine tuning, do you also think so?

    • @Walczyk
      @Walczyk День тому

      this is completely better and more advanced than phi 3 v crap image detection

  • @sohitshivhare1541
    @sohitshivhare1541 День тому

    Thanks for the information this is great. Can i fine tune it for certain specific images like few short learning. Can you put a tutorial for the same it will be great full.

  • @tonyrungeetech
    @tonyrungeetech День тому

    Hi Sam. Thank you for the videos. I've been playing around with some of the smaller vision models and trying to implement batched inferencing with little success. If you were trying to accomplish running multiple VQA style questions against the same image quickly, how would you go about that goal? Is batching even in the right direction I should be looking?

  • @srk5702
    @srk5702 День тому

    We request you to do fune tuning on object detection. Because, all llms are useful generating text oupit only. Thanks in advance

  • @parkerspitzer
    @parkerspitzer День тому

    Thanks for your work on sharing this information. Much easier to watch your content than keep my ear to the ground all day trying to keep up. Much appreciated, sir.

  • @mshonle
    @mshonle День тому

    I wonder how much performance would be affected when something so distilled then gets quantized? Also, it seems amazing that it can handle segmentation for an unspecified set size! With Phi3 Vision you would need to provide a token to represent, say, each giraffe you want to identify.

    • @samwitteveenai
      @samwitteveenai День тому

      quantization is a good question! I would expect it to suffer more than a big model. Might give it a test tomorrow.

  • @mukkeshmckenzie7386
    @mukkeshmckenzie7386 День тому

    Vqa tutorial would be nice!

  • @micbab-vg2mu
    @micbab-vg2mu День тому

    Thank you - it looks interesting:)

  • @leefeng5067
    @leefeng5067 День тому

    I did try it, it is good and works for me, different from Microsoft edge TTS.

  • @joachimschoder
    @joachimschoder День тому

    Thank you. This is saving me a loooooooooot of time

  • @Christian-go1oz
    @Christian-go1oz День тому

    Not taking audio in is bizarre. any ideas why not?

    • @samwitteveenai
      @samwitteveenai День тому

      this video is quite old now. Current version should be able to handle Audio now

  • @CookFu
    @CookFu День тому

    can a retrieval QA chain work with the memory function?

  • @CookFu
    @CookFu День тому

    can a retrieval chain work with memory function? I have been trying that for couple of days, but it doesn't work.

  • @Yipper64
    @Yipper64 День тому

    The only issue is that claude 3.5 doesnt have enough use without the pro plan to be good.

    • @samwitteveenai
      @samwitteveenai День тому

      how many responses are you getting and how many do you think would be a fair amount?

    • @Yipper64
      @Yipper64 День тому

      @@samwitteveenai Well honestly I think they shouldnt put a limit at all to conversation length. Just, like how OpenAI does it, make it so you can only use a certain number of messages an hour. Ill take the limit for that, but to have the conversation completely cut short at any point makes the LLM unusable for any tasks other than testing. At the same time if it gets people to sign up for the pro plan, that's business for you. Im not entitled to have the product for free, I get that, but im just saying its a demo in this state.

  • @strikeforcealpha9343
    @strikeforcealpha9343 2 дні тому

    The thing I'm worried about, is that 3.5 is extremley censored, like wwwwwwaaaay to much.

  • @Cdaprod
    @Cdaprod 2 дні тому

    Thank god possibly a standardized way to develop LLM Tool-Use 🎉

  • @champechilufya1458
    @champechilufya1458 2 дні тому

    Seems unnecessary

  • @oruzlorte1691
    @oruzlorte1691 2 дні тому

    Hi Sam. THX, I am a newbee. I don´t understand why I have to use google Colab?. Is there a difference if i use the AI Anthropic Opus directly? The Output is the same or not? I wil get the prompt with variables in google colab or direct in A. Opus.

    • @samwitteveenai
      @samwitteveenai День тому

      you can use it directly as long as you copy the prompt over fully etc. Colab is showing how to do it through the API

    • @oruzlorte1691
      @oruzlorte1691 День тому

      @@samwitteveenai o direct would be the similar output. Except that it's free. Because via Colab I would have to pay for the API. Did I understand correctly?

  • @VijayDChauhaan
    @VijayDChauhaan 2 дні тому

    Please provide any alternative solution for using this with open source models

    • @samwitteveenai
      @samwitteveenai День тому

      This kind of thing works with the Llama3 models often just need to play with the prompts a bit

  • @Mesquita2987
    @Mesquita2987 3 дні тому

    Why people dont give Dash the love it deserves. Such powerful python UI maker

    • @samwitteveenai
      @samwitteveenai 3 дні тому

      I used it for a while about 5 years ago, but abandoned it for streamlit. It was good for dashboards but not interactive chat etc back then. Haven't looked at it in a long time. Have they added much?

  • @miriamramstudio3982
    @miriamramstudio3982 3 дні тому

    Great video. Thanks

  • @miriamramstudio3982
    @miriamramstudio3982 3 дні тому

    Very useful. Thanks

  • @Strepite
    @Strepite 3 дні тому

    Unuseable due to ridiculous message limits even on paid plan. Misleading

    • @samwitteveenai
      @samwitteveenai День тому

      what do you feel would be a fair limit?

    • @Strepite
      @Strepite День тому

      @@samwitteveenai well for 20$ how about unlimited? Make it 30… whatever…

  • @timmark4190
    @timmark4190 3 дні тому

    What about Taipy

  • @gemini_537
    @gemini_537 3 дні тому

    This is super useful! ❤

  • @mediocreape
    @mediocreape 3 дні тому

    i actually switched it it as my primary model, it's so much better

  • @24-7gpts
    @24-7gpts 4 дні тому

    Claude 3.5 sonnet is just so good at coding! Something GPT-4o doesn't understand it gets it with minimal examples whereas GPT-4o is the opposite.