- tl;dr sec
- Posts
- AI Resources - Part 1
AI Resources - Part 1
A collection of interesting AI tools, products, resources, papers, and more I’ve come across.
A collection of interesting AI tools, products, resources, papers, and more I’ve come across.
tl;dr sec #225
A ChatGPT for Music Is Here. Inside Suno, the Startup Changing Everything
Nice long piece from Rolling Stone.
On Monday, I released my demo building a 4-layer DSPy program to convert questions into blog posts. On the same day, @EchoShao8899, @lateinteraction, and other researchers from Stanford released STORM 🌩️
STORM is an 8-layer system that can turn topics into articles with web… twitter.com/i/web/status/1…
— Erika Cardenas (@ecardenas300)
6:09 PM • Feb 28, 2024
We're opening up our thinking around Memory + Agents @browsercompany as we build infra for a "browser that browses for you"…
and the biggest thing on my mind is RELIABILITY
We wouldn’t trust tools that got emails/texts/etc wrong 1/10 times but that's where models are today-
— Hursh Agrawal (@hursh)
2:23 PM • Apr 4, 2024
Everyone says they’re an AI startup
Often it’s not clear which will win- the weak form and strong form
Everyone can integrate into the same APIs, what’s the defensibility over time?
There’s a real difference between AI apps and foundational models
Lots of growth driven by novelty but what about retention?
Hard to pick winners when it’s early. Mobile wave with Flipboard and Foursquare. Better to wait
Lack of proven business models
Concerns about hype and overvaluation
Matt Shumer: claude-prompt-engineer
(Repo) Just describe a task, and a chain of AIs will:
Generate many possible prompts
Test them in a ranked tournament
Return the best one
“Often, they outperform the prompts I'd write by hand (especially when I ask it to generate and compare 10+ prompts).”
Eladlev/AutoPrompt
A prompt optimization framework designed to enhance and perfect your prompts for real-world use cases.
The framework automatically generates high-quality, detailed prompts tailored to user intentions. It employs a refinement (calibration) process, where it iteratively builds a dataset of challenging edge cases and optimizes the prompt accordingly. This approach not only reduces manual effort in prompt engineering but also effectively addresses common issues such as prompt sensitivity and inherent prompt ambiguity issues.
Companies
CodeRabbit: “AI-first Code Reviewer.” Line by line reviews, issue validation, PR summarization,
Tusk.ai: AI-created pull requests for annoying tickets. From a Jira ticket or GitHub issue, it’ll automatically change website copy, adjust the UI, and make other small changes for you.
After AI beat them, professional Go players got better and more creative
Go player quality was plateauing from 1950s to mid 2010s. After DeepMind demonstrated AlphaGo in May 2016, after a few years, the weakest professional players were better than the strongest players before AI. The strongest players pushed beyond what had been thought possible.
It wasn’t simply that they imitated the AI, in a mechanical way. They got more creative, too. There was an uptick in historically novel moves and sequences. Shin et al calculate about 40 percent of the improvement came from moves that could have been memorized by studying the AI. But moves that deviated from what the AI would do also improved, and these “human moves” accounted for 60 percent of the improvement.
…
Something is considered impossible. Then somebody does it. Soon it is standard. This is a common pattern. Until Roger Bannister ran the 4-minute mile, the best runners clustered just above 4 minutes for decades. A few months later Bannister was no longer the only runner to do a 4-minute mile. These days, high schoolers do it.
…
When DeepBlue beat the chess world champion Kasparov in 1997, it was assumed this would be a blow to human chess players. It wasn’t. Chess became more popular than ever. And the games did not become machine-like and predictable. Instead, top players like Magnus Carlsen became more inventive than ever.
tl;dr sec #224
Anatomy of OpenAI's Developer Community
A Jupyter notebook analyzing a dump of 100K+ posts in the OpenAI Discourse. Core topics: the API, GPT builders, prompting, and more.
Choose Your Weapon: Survival Strategies for Depressed AI Academics
Now that modern AI research requires millions to train big models.
Give up
Try scaling anyway
Scale down
Reuse and remaster
Analysis instead of synthesis
RL! No Data!
Small models! No Compute!
Work on specialized application areas or domains
Solve problems few care about (for now)
Try things that shouldn’t work
Do things that have bad optics
Start it up; spin it out!
Collaborate, or jump ship
meistrari/prompts-royale - Automatically create prompts and make them fight each other to know which is the best.
AgentOps-AI/agentops - Python SDK for agent evals and observability. Build your next agent with benchmarks, observability, and replay analytics. AgentOps is the toolkit for evaluating and developing robust and reliable AI agents.
DAGWorks-Inc/burr
Build applications that make decisions based on state (chatbots, agents, simulations, etc...) from simple Python building blocks. Monitor, persist, and execute on your own infrastructure. Includes a UI that can track/monitor agent decisions in real time.
princeton-nlp/SWE-agent
SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories. On the full SWE-bench test set, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set.
plandex-ai/plandex
An open source, terminal-based AI coding engine for complex tasks. Plandex uses long-running agents to complete tasks that span multiple files and require many steps. It breaks up large tasks into smaller subtasks, then implements each one, continuing until it finishes the job. It helps you churn through your backlog, work with unfamiliar technologies, get unstuck, and spend less time on the boring stuff.
OpenDevin/OpenDevin
An open-source project aiming to replicate Devin, an autonomous AI software engineer who is capable of executing complex engineering tasks and collaborating actively with users on software development projects.
Sequoia’s AI Ascent 2024 YouTube Playlist
A series of short talks from cool folks. Some of the talks that stood out:
We invited 200+ engineers to take an exclusive look at the latest advancements in AI agent ecosystem
If you’re not following this space, your job is probably going to be replaced by AI.
Here’s what we saw at the @AgentOpsAI x @Relplicate Agent Meetup (🧵):
— Alex Reibman 🖇️ (@AlexReibman)
11:37 PM • Mar 26, 2024
1/ AgentOps Agents are slow, expensive, and unreliable. AgentOps is fixing that. Track, test, and benchmark AI agents from prototype to production
@siyangqiu
2/ Reworkd AI agents for navigating the web and scraping data Introducing: Tarsier— an open source framework that combines web scraping and OCR to extract text from web pages for the consumption of LLMs
@ReworkdAI
4/ Deepunit AI agent for automatically developing unit tests. Give this agent your repo and get complete code coverage over your entire project
@DeepUnitAI
5/ Deepgram Conversational AI tools for building voice bots and agents. Comes complete with realistic, low-latency voices
@DeepgramAI
7/ Composio Extremely simple integrations and tools for outfitting AI agents Building an AI agent to handle linear + github issues in 3 minutes
@KaranVaidya6
9/ OpenPipe Fine tune LLMs faster and 14x cheaper than OpenAI Outfit agents with faster, cheaper language models at scale
tl;dr sec #224
semanser/codel - Fully autonomous AI Agent that can perform complicated tasks and projects using terminal, browser, and editor.
misbahsy/RAGTune - An automated tuning and optimization tool for the RAG (Retrieval-Augmented Generation) pipeline. This tool allows you to evaluate different LLMs (Large Language Models), embedding models, query transformations, and rerankers. Twitter overview.
Introducing DBRX: A New State-of-the-Art Open LLM
From Databricks. “According to our measurements, it surpasses GPT-3.5, and it is competitive with Gemini 1.0 Pro. It is an especially capable code model, surpassing specialized models like CodeLLaMA-70B on programming, in addition to its strength as a general-purpose LLM.”
FlyFlow
“Optimize your LLM usage with 5x faster queries, 3x lower price, and the same quality as GPT4 using fine tuned models on autopilot. One line integration by changing a URL.” Demo page: “Flyflow offers fine tuning as a service. We proxy all of your GPT4 / Claude3 traffic, collect the responses, and use them to fine tune a smaller, faster, and cheaper model that matches GPT4 quality.”
tl;dr sec #222
KhoomeiK/LlamaGym - Fine-tune LLM agents with online reinforcement learning
bananaml/fructose - A Python package to create a dependable, strongly-typed interface around an LLM call. Just slap the @ai()
decorator on a type-annotated function and call it as you would a function.
relari-ai/continuous-eval - “Open-Source Evaluation for GenAI Application Pipelines.“
Arize - “The AI Observability & LLM Evaluation Platform”
Dan Shipper and Dave Clark (film director) video walk through of creating a movie with AI
AI will disrupt Hollywood (Part 28)📽️
Sora isn't available yet, but Creators have been generating cinema quality short films, trailers, and teasers with AI in just a few hours 🤯
10 wild examples: twitter.com/i/web/status/1…
— Min Choi (@minchoi)
4:54 PM • Mar 6, 2024
tl;dr sec #220
HyperWriteAI Agent Studio - A Chrome extension that lets you record a task which it can then replay.
tl;dr sec #219
samber/the-great-gpt-firewall - A curated list of websites bu Samuel Berthe that restrict access to AI Agents, AI crawlers and GPTs.
AutoFineTune - (thread) Easily fine-tune a small model with synthetically generated data. Generates 100+ synthetic message pairs with a GPT-4 loop and fine-tunes llama-2-7b with Together AI.
Gemini 1.5 Announcement - Uses a Mixture-of-Experts architecture, comes with a standard 128,000 token context window but there’s a limited preview with a context window of up to 1 million tokens (1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words.).
“In the Needle In A Haystack (NIAH) evaluation, where a small piece of text containing a particular fact or statement is purposely placed within a long block of text, 1.5 Pro found the embedded text 99% of the time, in blocks of data as long as 1 million tokens.”
“Gemini 1.5 Pro also shows impressive “in-context learning” skills, meaning that it can learn a new skill from information given in a long prompt, without needing additional fine-tuning. We tested this skill on the Machine Translation from One Book (MTOB) benchmark, which shows how well the model learns from information it’s never seen before.”
The killer app of Gemini Pro 1.5 is video
Simon Willison shares his experience playing around with Gemini Pro 1.5, and how it can take as input a quick video of his bookshelf and return the titles and authors as JSON.
AdGen AI - AI-generated creatives that perform.
tl;dr sec #218
traceloop/openllmetry-js - Open-source observability for your LLM application, based on OpenTelemetry.
ferrislucas/promptr - A CLI tool that lets you use plain English to instruct GPT-3 or GPT-4 to make changes to your codebase.
Deeptechia/geppetto - An advanced Slack bot integrating OpenAI's ChatGPT-4 and DALL-E-3 for interactive AI conversations and image generation. Enhances Slack communication with automated greetings, coherent responses, and creative visualizations.
lllyasviel/Fooocus - Image generating software, based on Gradio. Like Stable Diffusion, it’s offline, open source, and free. Like Midjourney, manual tweaking is not needed, users only need to focus on the prompts and images.
The System Prompt for ChatGPT
It’s interesting that it’s mostly just normal English instruction, no crazy prompt engineering.
Better Call GPT, Comparing Large Language Models Against Lawyers
Paper: “Our empirical analysis benchmarks LLMs against a ground truth set by Senior Lawyers, uncovering that advanced models match or exceed human accuracy in determining legal issues. In speed, LLMs complete reviews in mere seconds, eclipsing the hours required by their human counterparts. Cost wise, LLMs operate at a fraction of the price, offering a staggering 99.97 percent reduction in cost over traditional methods.”
tl;dr sec #217
screenshot-to-code - Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue).
wishful-search - A natural language search module for JSON arrays by Hrishi Olickel. Take any JSON array you have (notifications, movies, flights, people) and filter it with complex questions. WishfulSearch takes care of the prompting, database management, object-to-relational conversion and query formatting.
ElevenLabs Speech-to-Speech - Say it how you want it and transform your voice into another character, with full control over emotions, timing, and delivery.
tl;dr sec #216
Why you should invest in AI
Sarah Guo makes the case for why you should invest your time and attention in AI.
The next grand challenge for AI
Jim Fan presents the next grand challenge in the quest for AI: the "foundation agent," which would seamlessly operate across both the virtual and physical worlds.
Enhancing Lecture Notes with AI
A student describes how they record and live transcribe lectures, then pass the transcript to an LLM to get summary notes, in addition to their main hand-written notes.
LangGraph for multi-agent workflows
New functionality from LangChain that makes it easy to construct multi-agent workflows: each node is an agent, and the edges represent how they communicate.
tl;dr sec #215
Image Generation
KillianLucas/aifs
Local semantic search over folders. It will chunk and embed all nested supported files (txt, docx, pptx, jpg, png, eml, html, pdf).
Start with the most powerful model for your app’s use case (likely GPT-4). You want the best quality output so you can fine tune a smaller model.
Store your AI requests/responses so they can be easily exported. He uses @helicone_ai, which you can easy swap-in with OpenAI APIs and it stores all of your AI requests in an exportable table.
After you’ve collected ~100-500+ request/response pairs, export them and clean the data so that the inputs and outputs are of high quality. You can also leverage feedback from users (e.g. thumbs up/thumbs down) if you have it.
With the clean dataset, use a hosted OSS AI service like Together or Anyscale to fine-tune Mixtral 8x7B. He’s gotten better results with these than fine tuning GPT-3.5-Turbo on OpenAI.
Swap out GPT-4 with the fine-tuned model.
tl;dr sec #214
Products
tl;dr sec #213
The “Lever” prompting technique
From The Prompt Warrior: Whenever ChatGPT goes 'too far' or 'not far enough' with something, for example:
Tone too formal
Summarization too brief
Brainstorming not creative enough
Just do this:
Ask it to rate the output on a scale of 1-10 (define 1 and 10)
Then adjust to your desired number
On a scale of 1-10.
If 1 is a ...
And 10 is a ...
How would you rate this []?
tl;dr sec #212
Repeatedly asking ChatGPT to draw ever more normal images
And it gets weird.
ByteDance announces StemGen: A music generation model that listens
“Most models concentrate on generating fully mixed music in response to abstract conditioning information. In this work, we present an alternative paradigm for producing music generation models that can listen and respond to musical context.” (paper)
Deep dive: 4 NeurIPS 2023 best paper award papers - emergent ability, scaling, DPO, trustworthiness
Sophia Yang discusses the following NeurIP
OpenAI’s Official Prompt Engineering Guide
Six strategies and tactics in each for getting better results, including:
Write clear instructions
Include details in your query to get more relevant answers
Ask the model to adopt a persona
Use delimiters to clearly indicate distinct parts of the input
Specify the steps required to complete a task
Provide examples
Specify the desired length of the output
Provide reference text
Instruct the model to answer using a reference text
Instruct the model to answer with citations from a reference text
Split complex tasks into simpler subtasks
Use intent classification to identify the most relevant instructions for a user query
For dialogue applications that require very long conversations, summarize or filter previous dialogue
Summarize long documents piecewise and construct a full summary recursively
Give the model time to "think"
Instruct the model to work out its own solution before rushing to a conclusion
Use inner monologue or a sequence of queries to hide the model's reasoning process
Ask the model if it missed anything on previous passes
Use external tools
Use embeddings-based search to implement efficient knowledge retrieval
Use code execution to perform more accurate calculations or call external APIs
Give the model access to specific functions
Test changes systematically
Evaluate model outputs with reference to gold-standard answers
Agents
Autogen - OSS multi-agent conversation framework by Microsoft. Has some neat examples on their blog.
crewai - An OSS framework for orchestrating role-playing, autonomous agents.
E2B - Secure sandboxed cloud environments made for AI agents and AI apps. They’ve open sourced most of the underlying code.
Steamship - “The development platform for AI Agents.” Build AI Agents with their Python SDK, and effortlessly deploy them to the cloud. Gain access to serverless cloud hosting, vector search, webhooks, callbacks, and more.
Lindy.ai - “Meet your AI employee.” A no-code product aiming to make it easy to create a team of various AI agents using only English description of how they should behave (their prompt).
Without yet looking into it deeply, what seems to differentiate Lindy vs the other agent platforms is that it appears aimed at non-developer audiences and it seems to focus on having many integations, like Zapier, that make it easy to have agents interact with your calendar, email, GitHub, or whatever other systems you’re using.
Relevance AI - No code “build your AI workforce” platform.
AgentGPT - An autonomous AI Agent platform that empowers users to create and deploy customizable autonomous AI agents directly in the browser.
AgentRunner - “Create autonomous AI agents.”
research-agents-3.0 - Repo demonstrating Autogen + GPTs to build a group of AI researchers.
The State of AI Agents
Great roundup by the E2B folks on products built on top of agents, their challenges, standardization, and more, with some useful overview diagrams of many players in the space.
TIL about: “The Agent Protocol, adopted in the AutoGPT benchmarks, is a tech stack agnostic way to standardize and hence benchmark and compare AI agents.”
tl;dr sec #211
Movilla-Ocho/llamafile: Distribute and run LLMs with a single file
Grimoire - The top programming GPT right now.
Intro to Large Language Models - One hour intro video by Andrej Karpathy
DeepMake: An Adobe After Effects plugin that brings GenAI into your creative workflow.
elfvingralf/macOSpilot-ai-assistant - Voice + Vision powered AI assistant that answers questions about any application, in context and in audio.
Role-playing with AI will be a powerful tool for writers and educators - For example, GPT-4 helping you understand what an acid trip in 1963 would be like, or giving students the ability to make choices and decisions as historical actors.
Paper: Magicoder: Source Code Is All You Need - Magicoder is “a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. Magicoder models are trained on 75K synthetic instruction data using OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets to generate high-quality instruction data for code.”
The difference between GPT-4 being told in its prompt that it would receive no tip, a $20 tip, and a $200 tip.
— Andrew Curran (@AndrewCurran_)
12:46 AM • Dec 2, 2023
We pulled off an SEO heist using AI.
1. Exported a competitor’s sitemap
2. Turned their list of URLs into article titles
3. Created 1,800 articles from those titles at scale using AI18 months later, we have stolen:
- 3.6M total traffic
- 490K monthly traffic— Jake Ward (@jakezward)
12:47 PM • Nov 24, 2023
How to tackle unreliability of coding assistants
Thoughtworks’ Birgitta Böckeler shares some useful questions to ask yourself and perspective on how to think about coding with LLMs:
Do I have a quick feedback loop?
Can you verify quickly if the LLM output is correct or if it’s wasting your time?
Syntax highlighting, tests, run and observe behavior.
Do I have a reliable feedback loop?
What is the margin of error?
Do I need very recent info?
“If the AI assistants are unreliable, than why would I use them in the first place?”. There is a mindset shift we have to make when using Generative AI tools in general. We cannot use them with the same expectations we have for “regular” software. GitHub Copilot is not a traditional code generator that gives you 100% what you need. But in 40-60% of situations, it can get you 40-80% of the way there, which is still useful. When you adjust these expectations, and give yourself some time to understand the behaviours and quirks of the eager donkey, you’ll get more out of AI coding assistants.
tl;dr sec #210
Giskard - The testing framework for ML models. See also promptfoo.
HeyGen - AI-powered video creations at scale. New features: instant avatar (create an AI version of yourself), and translate you speaking in videos to another language.
Meet Aitana: The first Spanish AI model earning up to $11K/month. The thread includes some links to useful tutorials and guides.
Noiselith: Desktop app for Stable Diffusion XL so you can easily run it locally, offline.
AutoGen's TeachableAgent: New Autogen blog post that includes examples. TeachableAgent uses TextAnalyzerAgent so that users can teach their LLM-based assistants new facts, preferences, and skills.
tl;dr sec #209
Quicklinks
LangChain Templates Hub: 60+ templates contributed by the community. Search by popularity, tags by use cases and integrations
vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
LLM Chain querying a scientific Zotero library, with citations (Zotero is a popular tool for academics to manage bibliography data)
Sports Illustrated Published Articles by Fake, AI-Generated Writers
GPTs
Hacker Art by rez0
Data Analysis - Drop in files and it will analyze and visualize your data
The Negotiator - It will help you advocate for yourself and get better outcomes. Become a great negotiator.
AI + Music, Images, or Video
Scribble Diffusion: Turn your sketch into a refined image using AI
Dall-E Party: Recursively generate an image with DALL-E 3, describe it with GPT4 Vision, use that description with DALL-E 3, …
People think white AI-generated faces are more real than actual photos, study says - Attractiveness and "averageness" of AI-generated faces made them seem more real to the study participants, while the large variety of proportions in actual faces seemed unreal.
Frigate: Monitor your security cameras with locally processed AI.
Script that takes pics using your webcam and describes you like David Attenborough using GPT-4 Vision and ElevenLabs. Worth watching the demo video.
Introducing Stable Video Diffusion - The first foundation model for generative video based on the image model Stable Diffusion.
Meta brings us closer to AI-generated movies: Given a caption, image or a photo paired with a description, Emu Video can generate a 4 second animated clip. A complimentary tool can then edit those clips using natural language- “the same clip, but in slow motion.”
New music model from Google DeepMind: “With our music AI tools, users can create new music or instrumental sections from scratch, transform audio from one music style or instrument to another, and create instrumental and vocal accompaniments.” A limited set of creators will also be able to generate a unique soundtrack in the voice and style of participating artists like Charlie Puth, Demi Lovato, Sia, T-Pain, and more.
I’m not sure what timeline we’re in for there to be articles like this: People Can’t Access Their AI Girlfriend Because the Service Went Down After CEO Jailed for Setting His Apartment on Fire
LLMs cannot find reasoning errors, but can correct them!
Paper in which the authors break down the self-correction process into two core components: mistake finding and output correction. They find that LLMs generally struggle with finding logical mistakes, but for output correction, they propose a backtracking method which provides large improvements when given information on mistake location.
Outset is using GPT-4 to make user surveys better
YC-backed Outset uses GPT-4 to autonomously conduct and synthesize user surveys. Outset users create a survey and share the link with prospective survey takers, then Outset follows up with respondents to clarify, probe on answers and create a “conversational rapport” for deeper responses. Outset enabled WeightWatchers to conduct and synthesize over 100 interviews in 24 hours.
OpenAI Drama
AI Explained had a nice series of videos about it:
Altman’s polarizing past hints at OpenAI board’s reason for firing him
Previously Y Combinator founder Paul Graham gave Sam the boot from leading YC. Sam “had developed a reputation for favoring personal priorities over official duties and for an absenteeism that rankled his peers and some of the start-ups he was supposed to nurture.”
Re: the new OpenAI board: “Altman was unwilling to talk to anyone he didn’t already know. By Sunday, it became clear that Altman wanted a board composed of a majority of people who would let him get his way.”
“One person who has worked closely with Altman described a pattern of consistent and subtle manipulation that sows division between individuals.”
“A former OpenAI employee, machine learning researcher Geoffrey Irving, who now works at competitor Google DeepMind, wrote that he was disinclined to support Altman after working for him for two years. “1. He was always nice to me. 2. He lied to me on various occasions 3. He was deceptive, manipulative, and worse to others, including my close friends (again, only nice to me, for reasons).””
Exclusive: OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say
Supposedly several staff researchers at OpenAI wrote a letter to the board of directors a warning of a powerful AI discovery that could threaten humanity. Allegedly there was a project, Q*, that was able to solve certain math problems, implying it might have great reasoning capabilities than just predicting the next word. This could be applied to novel scientific research, for instance.
This may have been what Sam Altman meant when he said being in the room “where we push the veil of ignorance back and the frontier of discovery forward.”
OpenAI’s Misalignment and Microsoft’s Gain
Stratechery deep dive on the implications of OpenAI’s non-profit model and governance situation, internal cultural dynamics at OpenAI, Microsoft’s role, Altman’s reputation, and thoughts going forward.