- tl;dr sec
- AI Resources - Part 1
AI Resources - Part 1
A collection of interesting AI tools, products, resources, papers, and more I’ve come across.
A collection of interesting AI tools, products, resources, papers, and more I’ve come across.
tl;dr sec #219
Gemini 1.5 Announcement - Uses a Mixture-of-Experts architecture, comes with a standard 128,000 token context window but there’s a limited preview with a context window of up to 1 million tokens (1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words.).
“In the Needle In A Haystack (NIAH) evaluation, where a small piece of text containing a particular fact or statement is purposely placed within a long block of text, 1.5 Pro found the embedded text 99% of the time, in blocks of data as long as 1 million tokens.”
“Gemini 1.5 Pro also shows impressive “in-context learning” skills, meaning that it can learn a new skill from information given in a long prompt, without needing additional fine-tuning. We tested this skill on the Machine Translation from One Book (MTOB) benchmark, which shows how well the model learns from information it’s never seen before.”
The killer app of Gemini Pro 1.5 is video
Simon Willison shares his experience playing around with Gemini Pro 1.5, and how it can take as input a quick video of his bookshelf and return the titles and authors as JSON.
AdGen AI - AI-generated creatives that perform.
tl;dr sec #218
traceloop/openllmetry-js - Open-source observability for your LLM application, based on OpenTelemetry.
ferrislucas/promptr - A CLI tool that lets you use plain English to instruct GPT-3 or GPT-4 to make changes to your codebase.
Deeptechia/geppetto - An advanced Slack bot integrating OpenAI's ChatGPT-4 and DALL-E-3 for interactive AI conversations and image generation. Enhances Slack communication with automated greetings, coherent responses, and creative visualizations.
lllyasviel/Fooocus - Image generating software, based on Gradio. Like Stable Diffusion, it’s offline, open source, and free. Like Midjourney, manual tweaking is not needed, users only need to focus on the prompts and images.
The System Prompt for ChatGPT
It’s interesting that it’s mostly just normal English instruction, no crazy prompt engineering.
Better Call GPT, Comparing Large Language Models Against Lawyers
Paper: “Our empirical analysis benchmarks LLMs against a ground truth set by Senior Lawyers, uncovering that advanced models match or exceed human accuracy in determining legal issues. In speed, LLMs complete reviews in mere seconds, eclipsing the hours required by their human counterparts. Cost wise, LLMs operate at a fraction of the price, offering a staggering 99.97 percent reduction in cost over traditional methods.”
tl;dr sec #217
screenshot-to-code - Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue).
wishful-search - A natural language search module for JSON arrays by Hrishi Olickel. Take any JSON array you have (notifications, movies, flights, people) and filter it with complex questions. WishfulSearch takes care of the prompting, database management, object-to-relational conversion and query formatting.
ElevenLabs Speech-to-Speech - Say it how you want it and transform your voice into another character, with full control over emotions, timing, and delivery.
tl;dr sec #216
Why you should invest in AI
Sarah Guo makes the case for why you should invest your time and attention in AI.
The next grand challenge for AI
Jim Fan presents the next grand challenge in the quest for AI: the "foundation agent," which would seamlessly operate across both the virtual and physical worlds.
Enhancing Lecture Notes with AI
A student describes how they record and live transcribe lectures, then pass the transcript to an LLM to get summary notes, in addition to their main hand-written notes.
LangGraph for multi-agent workflows
New functionality from LangChain that makes it easy to construct multi-agent workflows: each node is an agent, and the edges represent how they communicate.
tl;dr sec #215
Local semantic search over folders. It will chunk and embed all nested supported files (txt, docx, pptx, jpg, png, eml, html, pdf).
Start with the most powerful model for your app’s use case (likely GPT-4). You want the best quality output so you can fine tune a smaller model.
Store your AI requests/responses so they can be easily exported. He uses @helicone_ai, which you can easy swap-in with OpenAI APIs and it stores all of your AI requests in an exportable table.
After you’ve collected ~100-500+ request/response pairs, export them and clean the data so that the inputs and outputs are of high quality. You can also leverage feedback from users (e.g. thumbs up/thumbs down) if you have it.
With the clean dataset, use a hosted OSS AI service like Together or Anyscale to fine-tune Mixtral 8x7B. He’s gotten better results with these than fine tuning GPT-3.5-Turbo on OpenAI.
Swap out GPT-4 with the fine-tuned model.
tl;dr sec #214
tl;dr sec #213
Tone too formal
Summarization too brief
Brainstorming not creative enough
Just do this:
Ask it to rate the output on a scale of 1-10 (define 1 and 10)
Then adjust to your desired number
tl;dr sec #212
Repeatedly asking ChatGPT to draw ever more normal images
And it gets weird.
ByteDance announces StemGen: A music generation model that listens
“Most models concentrate on generating fully mixed music in response to abstract conditioning information. In this work, we present an alternative paradigm for producing music generation models that can listen and respond to musical context.” (paper)
Deep dive: 4 NeurIPS 2023 best paper award papers - emergent ability, scaling, DPO, trustworthiness
Sophia Yang discusses the following NeurIP
OpenAI’s Official Prompt Engineering Guide
Six strategies and tactics in each for getting better results, including:
Write clear instructions
Include details in your query to get more relevant answers
Ask the model to adopt a persona
Use delimiters to clearly indicate distinct parts of the input
Specify the steps required to complete a task
Specify the desired length of the output
Provide reference text
Instruct the model to answer using a reference text
Instruct the model to answer with citations from a reference text
Split complex tasks into simpler subtasks
Use intent classification to identify the most relevant instructions for a user query
For dialogue applications that require very long conversations, summarize or filter previous dialogue
Summarize long documents piecewise and construct a full summary recursively
Give the model time to "think"
Instruct the model to work out its own solution before rushing to a conclusion
Use inner monologue or a sequence of queries to hide the model's reasoning process
Ask the model if it missed anything on previous passes
Use external tools
Use embeddings-based search to implement efficient knowledge retrieval
Use code execution to perform more accurate calculations or call external APIs
Give the model access to specific functions
Test changes systematically
Evaluate model outputs with reference to gold-standard answers
crewai - An OSS framework for orchestrating role-playing, autonomous agents.
E2B - Secure sandboxed cloud environments made for AI agents and AI apps. They’ve open sourced most of the underlying code.
Steamship - “The development platform for AI Agents.” Build AI Agents with their Python SDK, and effortlessly deploy them to the cloud. Gain access to serverless cloud hosting, vector search, webhooks, callbacks, and more.
Lindy.ai - “Meet your AI employee.” A no-code product aiming to make it easy to create a team of various AI agents using only English description of how they should behave (their prompt).
Without yet looking into it deeply, what seems to differentiate Lindy vs the other agent platforms is that it appears aimed at non-developer audiences and it seems to focus on having many integations, like Zapier, that make it easy to have agents interact with your calendar, email, GitHub, or whatever other systems you’re using.
Relevance AI - No code “build your AI workforce” platform.
AgentGPT - An autonomous AI Agent platform that empowers users to create and deploy customizable autonomous AI agents directly in the browser.
AgentRunner - “Create autonomous AI agents.”
research-agents-3.0 - Repo demonstrating Autogen + GPTs to build a group of AI researchers.
The State of AI Agents
Great roundup by the E2B folks on products built on top of agents, their challenges, standardization, and more, with some useful overview diagrams of many players in the space.
tl;dr sec #211
Movilla-Ocho/llamafile: Distribute and run LLMs with a single file
Grimoire - The top programming GPT right now.
DeepMake: An Adobe After Effects plugin that brings GenAI into your creative workflow.
elfvingralf/macOSpilot-ai-assistant - Voice + Vision powered AI assistant that answers questions about any application, in context and in audio.
Role-playing with AI will be a powerful tool for writers and educators - For example, GPT-4 helping you understand what an acid trip in 1963 would be like, or giving students the ability to make choices and decisions as historical actors.
Paper: Magicoder: Source Code Is All You Need - Magicoder is “a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. Magicoder models are trained on 75K synthetic instruction data using OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets to generate high-quality instruction data for code.”
The difference between GPT-4 being told in its prompt that it would receive no tip, a $20 tip, and a $200 tip.
— Andrew Curran (@AndrewCurran_)
Dec 2, 2023
We pulled off an SEO heist using AI.
1. Exported a competitor’s sitemap
2. Turned their list of URLs into article titles
3. Created 1,800 articles from those titles at scale using AI
18 months later, we have stolen:
- 3.6M total traffic
- 490K monthly traffic
— Jake Ward (@jakezward)
Nov 24, 2023
Do I have a quick feedback loop?
Can you verify quickly if the LLM output is correct or if it’s wasting your time?
Syntax highlighting, tests, run and observe behavior.
Do I have a reliable feedback loop?
What is the margin of error?
Do I need very recent info?
tl;dr sec #210
HeyGen - AI-powered video creations at scale. New features: instant avatar (create an AI version of yourself), and translate you speaking in videos to another language.
Meet Aitana: The first Spanish AI model earning up to $11K/month. The thread includes some links to useful tutorials and guides.
Noiselith: Desktop app for Stable Diffusion XL so you can easily run it locally, offline.
AutoGen's TeachableAgent: New Autogen blog post that includes examples. TeachableAgent uses TextAnalyzerAgent so that users can teach their LLM-based assistants new facts, preferences, and skills.
tl;dr sec #209
LangChain Templates Hub: 60+ templates contributed by the community. Search by popularity, tags by use cases and integrations
vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
LLM Chain querying a scientific Zotero library, with citations (Zotero is a popular tool for academics to manage bibliography data)
Data Analysis - Drop in files and it will analyze and visualize your data
AI + Music, Images, or Video
Scribble Diffusion: Turn your sketch into a refined image using AI
Dall-E Party: Recursively generate an image with DALL-E 3, describe it with GPT4 Vision, use that description with DALL-E 3, …
People think white AI-generated faces are more real than actual photos, study says - Attractiveness and "averageness" of AI-generated faces made them seem more real to the study participants, while the large variety of proportions in actual faces seemed unreal.
Frigate: Monitor your security cameras with locally processed AI.
Script that takes pics using your webcam and describes you like David Attenborough using GPT-4 Vision and ElevenLabs. Worth watching the demo video.
Introducing Stable Video Diffusion - The first foundation model for generative video based on the image model Stable Diffusion.
Meta brings us closer to AI-generated movies: Given a caption, image or a photo paired with a description, Emu Video can generate a 4 second animated clip. A complimentary tool can then edit those clips using natural language- “the same clip, but in slow motion.”
New music model from Google DeepMind: “With our music AI tools, users can create new music or instrumental sections from scratch, transform audio from one music style or instrument to another, and create instrumental and vocal accompaniments.” A limited set of creators will also be able to generate a unique soundtrack in the voice and style of participating artists like Charlie Puth, Demi Lovato, Sia, T-Pain, and more.
I’m not sure what timeline we’re in for there to be articles like this: People Can’t Access Their AI Girlfriend Because the Service Went Down After CEO Jailed for Setting His Apartment on Fire
LLMs cannot find reasoning errors, but can correct them!
Paper in which the authors break down the self-correction process into two core components: mistake finding and output correction. They find that LLMs generally struggle with finding logical mistakes, but for output correction, they propose a backtracking method which provides large improvements when given information on mistake location.
Outset is using GPT-4 to make user surveys better
YC-backed Outset uses GPT-4 to autonomously conduct and synthesize user surveys. Outset users create a survey and share the link with prospective survey takers, then Outset follows up with respondents to clarify, probe on answers and create a “conversational rapport” for deeper responses. Outset enabled WeightWatchers to conduct and synthesize over 100 interviews in 24 hours.
AI Explained had a nice series of videos about it:
Altman’s polarizing past hints at OpenAI board’s reason for firing him
Previously Y Combinator founder Paul Graham gave Sam the boot from leading YC. Sam “had developed a reputation for favoring personal priorities over official duties and for an absenteeism that rankled his peers and some of the start-ups he was supposed to nurture.”
Re: the new OpenAI board: “Altman was unwilling to talk to anyone he didn’t already know. By Sunday, it became clear that Altman wanted a board composed of a majority of people who would let him get his way.”
“One person who has worked closely with Altman described a pattern of consistent and subtle manipulation that sows division between individuals.”
“A former OpenAI employee, machine learning researcher Geoffrey Irving, who now works at competitor Google DeepMind, wrote that he was disinclined to support Altman after working for him for two years. “1. He was always nice to me. 2. He lied to me on various occasions 3. He was deceptive, manipulative, and worse to others, including my close friends (again, only nice to me, for reasons).””
Exclusive: OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say
Supposedly several staff researchers at OpenAI wrote a letter to the board of directors a warning of a powerful AI discovery that could threaten humanity. Allegedly there was a project, Q*, that was able to solve certain math problems, implying it might have great reasoning capabilities than just predicting the next word. This could be applied to novel scientific research, for instance.
This may have been what Sam Altman meant when he said being in the room “where we push the veil of ignorance back and the frontier of discovery forward.”
OpenAI’s Misalignment and Microsoft’s Gain
Stratechery deep dive on the implications of OpenAI’s non-profit model and governance situation, internal cultural dynamics at OpenAI, Microsoft’s role, Altman’s reputation, and thoughts going forward.