![Tar9897](https://cdn-avatars.huggingface.co/v1/production/uploads/66055c33d0703e48e206c606/VPBpTh06gJ6pZ5bgQcUQJ.png)
about an hour ago
In parallel with grammar learning, the agent would also use language grounding techniques to link words to their sensory representations and abstract concepts which would mean the agent learns about the word meanings, synonyms, antonyms, and semantic relationships from both textual data as well as perceptual experiences.
The result would be the agent developing a rich lexicon and conceptual knowledge base that underlies its language understanding as well as generation. With this basic knowledge of grammar and word meanings, the agent can then learn to synthesize words and phrases so as to express specific ideas or concepts. Building on this, the agent would then learn how to generate complete sentences which the agent would continuously refine and improve. Eventually the agent would learn how to generate sequence of sentences in the form of dialogues or narratives, taking into account context, goals, as well as user-feedback.
I believe that by gradually learning how to improve their responses, the agent would gradually also acquire the ability to generate coherent, meaningful, and contextually appropriate language. This would allow them to reason without hallucinating which LLMs struggle at.
Developing such agents would not require a lot of compute and the code would be simple & easy to understand. It will definitely introduce everyone to symbolic AI and making agents which are good at reasoning tasks. Thus solving a crucial problem with LLMs. We have used a similar architecture to make our model learn constantly. Do sign up as we start opening access next week at
![tonywu71](https://cdn-avatars.huggingface.co/v1/production/uploads/1650784534234-noauth.png)
about 3 hours ago
Our latest research paper, "ColPali: Efficient Document Retrieval with Vision Language Models," introduces a groundbreaking approach to large-scale visual document analysis. By leveraging Vision Language Models (VLMs), we have created a new framework for document retrieval that's both powerful and efficient.
Key Insights:
π‘ ColPali combines ColBERT's multi-vector strategy with VLMs' document understanding capabilities
βοΈ ColPali is based on PaliGemma-3B (SigLIP, Gemma-2B) + a linear projection layer and is trained to maximize the similarity between the document and the query embeddings
π The Vision Document Retrieval benchmark (ViDoRe) is a challenging dataset that spans various industry topics and aims at matching real-life retrieval scenarios
π ColPali outperforms existing models on all datasets in ViDoRe (average NDCG@5 of 81.3% vs 67.0% for the best baseline model)
β‘ ColPali is faster at document embedding compared to traditional PDF parser pipelines, making ColPali viable for industrial use
π ColPali is highly interpretable thanks to patch-based similarity maps
Dive deeper into ColPali and explore our resources:
π Full paper: arxiv.org/abs/2407.01449
π οΈ Datasets, model weights, evaluation code, leaderboard, demos: huggingface.co/vidore
Shoutout to my amazing co-authors Manuel Faysse ( @manu ) and Hugues Sibille ( @HugSib ). We are grateful for the invaluable feedback from Bilel Omrani, Gautier Viaud, Celine Hudelot, and Pierre Colombo. This work is sponsored by ILLUIN Technology. β¨
![ezgikorkmaz](https://cdn-avatars.huggingface.co/v1/production/uploads/667c1a5acb6800a191024eb9/AqL8mQZsZjpZKi9FxtkIH.png)
about 4 hours ago
A Survey Analyzing Generalization in Deep Reinforcement Learning
Paper:
GitHub:
![DmitryRyumin](https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/nRCxbVng_PPBqKd-Z3KVc.jpeg)
about 7 hours ago
π Title: Expressive Gaussian Human Avatars from Monocular RGB Video π
π Description: The new EVA model enhances the expressiveness of digital avatars by using 3D Gaussians and SMPL-X to capture fine-grained hand and face details from monocular RGB video.
π₯ Authors: Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, and Zhangyang Wang
π Paper:
π Github Page:
π Repository:
π CVPR-2023-24-Papers:
π WACV-2024-Papers:
π ICCV-2023-Papers:
π More Papers: more cutting-edge research presented at other conferences in the curated by @DmitryRyumin
π Added to the Avatars Collection:
π Keywords: #DigitalAvatars #3DModeling #ComputerVision #MonocularVideo #SMPLX #3DGaussians #AvatarExpressiveness #HandTracking #FacialExpressions #AI #MachineLearning
![iofu728](https://cdn-avatars.huggingface.co/v1/production/uploads/6278bd42541f3d2dfa77ea70/ejn49eapnB3UXQckAYdTd.jpeg)
about a day ago
For more detail please check,
project page:
code:
paper:
hf demo:
![fdaudens](https://cdn-avatars.huggingface.co/v1/production/uploads/647f36a8454af0237bd49574/jshkqBUTY-GZL8As8y6Aq.jpeg)
about a day ago
Find exactly what you need with filters for:
- Modalities (text, image, audio, etc.)
- Dataset size
- File format
Try it now:
What other filters would you find useful? Drop your ideas!
![victor](https://cdn-avatars.huggingface.co/v1/production/uploads/1616001397867-5f17f0a0925b9863e28ad517.png)
about a day ago
I'd be super happy to give you a GPU grant to host it on a Space, it would allow more people to discover and use it!
about a day ago
- ποΈ Choose between 27B IT and 9b IT models
- π Fast inference using llama.cpp
-
![tomaarsen](https://cdn-avatars.huggingface.co/v1/production/uploads/6317233cc92fd6fee317e030/cJHSvvimr1kqgQfHOjO5n.png)
about a day ago
π Trained on a large dataset of 558k Arabic triplets translated from the AllNLI triplet dataset:
6οΈβ£ 6 different base models: AraBERT, MarBERT, LaBSE, MiniLM, paraphrase-multilingual-mpnet-base, mpnet-base, ranging from 109M to 471M parameters.
πͺ Trained with a Matryoshka loss, allowing you to truncate embeddings with minimal performance loss: smaller embeddings are faster to compare.
π Outperforms all commonly used multilingual models like , , and .
Check them out here:
-
-
-
-
-
-
Or the collection with all:
My personal favourite is likely : a very efficient 135M parameters & scores #1 on .
![alvdansen](https://cdn-avatars.huggingface.co/v1/production/uploads/635dd6cd4fabde0df74aeae6/5n9WPf3O1wqIOqhE1DtMK.jpeg)
about a day ago
I put together my own page of models using their code and LoRA. Enjoy!