Tar9897
Tar9897
about an hour ago
I believe in order to make models reach Human-Level Learning, serious students can start by developing an intelligent neuromorphic agent. We develop an intelligent agent and make it learn about grammar patterns as well as about different word categories through symbolic representations, following which we dwell into making the agent learn about other rules of the Language.

In parallel with grammar learning, the agent would also use language grounding techniques to link words to their sensory representations and abstract concepts which would mean the agent learns about the word meanings, synonyms, antonyms, and semantic relationships from both textual data as well as perceptual experiences.

The result would be the agent developing a rich lexicon and conceptual knowledge base that underlies its language understanding as well as generation. With this basic knowledge of grammar and word meanings, the agent can then learn to synthesize words and phrases so as to express specific ideas or concepts. Building on this, the agent would then learn how to generate complete sentences which the agent would continuously refine and improve. Eventually the agent would learn how to generate sequence of sentences in the form of dialogues or narratives, taking into account context, goals, as well as user-feedback.

I believe that by gradually learning how to improve their responses, the agent would gradually also acquire the ability to generate coherent, meaningful, and contextually appropriate language. This would allow them to reason without hallucinating which LLMs struggle at.

Developing such agents would not require a lot of compute and the code would be simple & easy to understand. It will definitely introduce everyone to symbolic AI and making agents which are good at reasoning tasks. Thus solving a crucial problem with LLMs. We have used a similar architecture to make our model learn constantly. Do sign up as we start opening access next week at
...read more
    πŸ€— 1
    πŸš€ 1
    tonywu71
    tonywu71
    about 3 hours ago
    ColPali: A new approach to efficient and intelligent document retrieval πŸš€

    Our latest research paper, "ColPali: Efficient Document Retrieval with Vision Language Models," introduces a groundbreaking approach to large-scale visual document analysis. By leveraging Vision Language Models (VLMs), we have created a new framework for document retrieval that's both powerful and efficient.

    Key Insights:
    πŸ’‘ ColPali combines ColBERT's multi-vector strategy with VLMs' document understanding capabilities
    βš™οΈ ColPali is based on PaliGemma-3B (SigLIP, Gemma-2B) + a linear projection layer and is trained to maximize the similarity between the document and the query embeddings
    πŸ“Š The Vision Document Retrieval benchmark (ViDoRe) is a challenging dataset that spans various industry topics and aims at matching real-life retrieval scenarios
    πŸ† ColPali outperforms existing models on all datasets in ViDoRe (average NDCG@5 of 81.3% vs 67.0% for the best baseline model)
    ⚑ ColPali is faster at document embedding compared to traditional PDF parser pipelines, making ColPali viable for industrial use
    πŸ” ColPali is highly interpretable thanks to patch-based similarity maps

    Dive deeper into ColPali and explore our resources:
    πŸ“‘ Full paper: arxiv.org/abs/2407.01449
    πŸ› οΈ Datasets, model weights, evaluation code, leaderboard, demos: huggingface.co/vidore

    Shoutout to my amazing co-authors Manuel Faysse ( @manu ) and Hugues Sibille ( @HugSib ). We are grateful for the invaluable feedback from Bilel Omrani, Gautier Viaud, Celine Hudelot, and Pierre Colombo. This work is sponsored by ILLUIN Technology. ✨
    ...read more
    ezgikorkmaz
    ezgikorkmaz
    about 4 hours ago
    ...read more
      DmitryRyumin
      DmitryRyumin
      about 7 hours ago
      πŸš€πŸ•ΊπŸŒŸ New Research Alert (Avatars Collection)! πŸŒŸπŸ’ƒπŸš€
      πŸ“„ Title: Expressive Gaussian Human Avatars from Monocular RGB Video πŸ”

      πŸ“ Description: The new EVA model enhances the expressiveness of digital avatars by using 3D Gaussians and SMPL-X to capture fine-grained hand and face details from monocular RGB video.

      πŸ‘₯ Authors: Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, and Zhangyang Wang

      πŸ“„ Paper:

      🌐 Github Page:
      πŸ“ Repository:

      πŸš€ CVPR-2023-24-Papers:

      πŸš€ WACV-2024-Papers:

      πŸš€ ICCV-2023-Papers:

      πŸ“š More Papers: more cutting-edge research presented at other conferences in the curated by @DmitryRyumin

      πŸš€ Added to the Avatars Collection:

      πŸ” Keywords: #DigitalAvatars #3DModeling #ComputerVision #MonocularVideo #SMPLX #3DGaussians #AvatarExpressiveness #HandTracking #FacialExpressions #AI #MachineLearning
      ...read more
      πŸ”₯ 3
      πŸš€ 1
      iofu728
      iofu728
      about a day ago
      Weclome to use MInference, which leverages the dynamic sparse nature of LLMs' attention, which exhibits some static patterns, to speed up the pre-filling for million tokens LLMs. It first determines offline which sparse pattern each head belongs to, then approximates the sparse index online and dynamically computes attention with the optimal custom kernels. This approach achieves up to a 10x speedup for pre-filling on an A100 while maintaining accuracy with 1M tokens.

      For more detail please check,
      project page:
      code:
      paper:
      hf demo:
      ...read more
      πŸ”₯ 1
      fdaudens
      fdaudens
      about a day ago
      New dataset filtering feature just dropped! πŸ€—πŸš€

      Find exactly what you need with filters for:
      - Modalities (text, image, audio, etc.)
      - Dataset size
      - File format

      Try it now:

      What other filters would you find useful? Drop your ideas!
      ...read more
      πŸ”₯ 1
      victor
      victor
      about a day ago
      ...read more
        ❀️ 2
        gokaygokay
        gokaygokay
        about a day ago
        I've created a space for chatting with Gemma 2 using llama.cpp

        - πŸŽ›οΈ Choose between 27B IT and 9b IT models
        - πŸš€ Fast inference using llama.cpp

        -
        ...read more
          πŸ‘ 9
          tomaarsen
          tomaarsen
          about a day ago
          @Omartificial-Intelligence-Space has trained and released 6 Arabic embedding models for semantic similarity. 4 of them outperform all previous models on the STS17 Arabic-Arabic task!

          πŸ“š Trained on a large dataset of 558k Arabic triplets translated from the AllNLI triplet dataset:
          6️⃣ 6 different base models: AraBERT, MarBERT, LaBSE, MiniLM, paraphrase-multilingual-mpnet-base, mpnet-base, ranging from 109M to 471M parameters.
          πŸͺ† Trained with a Matryoshka loss, allowing you to truncate embeddings with minimal performance loss: smaller embeddings are faster to compare.
          πŸ“ˆ Outperforms all commonly used multilingual models like , , and .

          Check them out here:
          -
          -
          -
          -
          -
          -
          Or the collection with all:

          My personal favourite is likely : a very efficient 135M parameters & scores #1 on .
          ...read more
          πŸ”₯ 7
          πŸ‘ 2
          alvdansen
          alvdansen
          about a day ago
          I really like what the @jasperAITeam designed with Flash LoRA. It works really well for something that generates so quickly, and I'm excited to test it out with Animate Diff, because I recently was testing LCM on it's own for AD and the results were already promising.

          I put together my own page of models using their code and LoRA. Enjoy!


          ...read more
            ❀️ 2