Skip to content Skip to footer
0 items - $0.00 0

Run structured extraction on documents/images locally with Ollama and Pydantic by EarlyOom

Run structured extraction on documents/images locally with Ollama and Pydantic by EarlyOom

Run structured extraction on documents/images locally with Ollama and Pydantic by EarlyOom

9 Comments

  • Post Author
    EarlyOom
    Posted February 20, 2025 at 1:54 am

    We put together an open-source collection of Pydantic schemas for a variety of document categories (W2 filings, invoices etc.), including instructions for how to get structured JSON responses from any visual input with the model of your choosing. Run everything locally.

  • Post Author
    jbmsf
    Posted February 20, 2025 at 3:10 am

    Interesting. We're using a SAAS solution for document extraction right now. I don't know if it's in our interest to build out more but I do like the idea of keeping extraction local.

  • Post Author
    jasonjmcghee
    Posted February 20, 2025 at 6:13 am

    I've used "structured output" (with supplied schema) on Google and openai, and function calling / tool use on those, anthropic and others- and afaict they are functionally the same (if you force a specific function / schema). Has someone had a different experience?

  • Post Author
    kaushikbokka
    Posted February 20, 2025 at 6:13 am

    Have you folks tried finetuning models for data extraction from visual data?

  • Post Author
    jauntywundrkind
    Posted February 20, 2025 at 6:18 am

    I'd really like to play with Qwen2.5-VL at some point, perhaps for reading data-sheets for microchips. Nicely for some applications, it's also very good at reporting position of what it finds, which many ML tools are pretty mediocre at. https://qwenlm.github.io/blog/qwen2.5-vl/

    Not really this application, but QvQ for visual reasoning is also impressive. https://qwenlm.github.io/blog/qvq-72b-preview/

    Meta has used Qwen as the basis for their Apollo research. https://arxiv.org/abs/2412.10360

  • Post Author
    youknowwhentous
    Posted February 20, 2025 at 6:39 am

    This seems to work for videos as well. Pretty cool demo and very nice interface for the pydantic types.

  • Post Author
    18chetanpatel
    Posted February 20, 2025 at 7:08 am

    This is something I was searching for..Thanks for creating!

  • Post Author
    joatmon-snoo
    Posted February 20, 2025 at 8:30 am

    Super cool! We at BAML had been thinking about doing something like this for our ecosystem as well – we’d love to add BAML models to this repo!

    If you haven’t heard of us, we provide a language and runtime that enable defining your schemas in a simpler syntax, and allow usage with _any_ model, not just those that implement tool calling or json mode, by by relying on schema-aligned parsing. Check it out! https://github.com/BoundaryML/baml

  • Post Author
    Inviz
    Posted February 20, 2025 at 9:49 am

    What are the most promising ways to extract information from picture like this, if the domain has strict time constraints? What's the second best way that is still fast?

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.