EP03: An Interview with Sander Schulhoff, Founder of LearnPrompting.org

In this episode, we chatter with Sander Schulhoff to learn about the Prompt Hacking Competition, the story behind LearnPrompting.org, and the inner workings of the Prompt Engineering Certificate.

Howdy, prompt engineers and AI enthusiasts!

In this week’s episode…

Wes and Goda interview Sander Schulhoff, founder of LearnPromoting.org, an open-source website where more than 500,000 people learn about the ins and outs of Prompt Engineering. And with a growing community of over 30.000 Discord users, now along with a newly founded team, Sanders is working on exciting initiatives like Prompt Engineering Certificate and HackAPrompt Competition, the first prompt hacking competition, to help the safety research community understand the spectrum of attacks and suggest defenses.

About Sander Schulhoff:

First and foremost, Sander is a researcher at the University of Maryland; Sander's research focuses on Stabilizing Hostilities through Arbitration and Diplomatic Engagement, showcasing his commitment to using technology for global good. Passionate about natural language processing (NLP) and deep reinforcement learning (RL), Sander has made a name for himself in AI research and as a contestant in the prestigious MineRL competition, developing intelligent agents for Minecraft. Sander is also a founder of Startup Shell, where he fosters his passion for innovation by providing a dynamic environment for aspiring entrepreneurs.

Key Take Aways from Podcast:

  1. Sander Schulhoff, founded LearnPromoting.org, an open-source website that has grown to include a 30,000-person Discord and half a million users. 

  2. Learn Prompting will launch the world’s largest Prompt Hacking competition, with $40,000 in prizes, sponsored by OpenAI, StabilityAI, Scale, and others in the AI space.

  3. Learn Prompting will soon offer a certification exam covering the fundamentals of prompt engineering with the team seeking industry and collegial endorsements of the certificate.

1. The Remarkable Origins of LearnPrompting.org and Its Impact on Prompt Engineering Community

Since December 2022, LearnPrompting.org has grown from a simple class assignment to a thriving community of prompting enthusiasts and AI researchers. Today, with over half a million users and 30,000 active Discord members, the website serves as a hub for prompt engineering, a crucial aspect of the modern way of how to talk to AI.

The idea for LearnPrompting.org was born in a college English class, where Sander was tasked with creating a guide on any subject. Although a few research papers and blog posts were available at the time, no comprehensive guide existed. Instead of writing a lab guide on storing chemicals, he focused on prompt engineering, a topic with potentially real-world applications and problem-solving capabilities.

Sander’s background in natural language processing (NLP) and their connections to researchers in the field played a significant role in the creation of LearnPrompting.org. Drawing from the work of influential figures such as Simon Williamson and various researchers in Sander's lab, he compiled a substantial wealth of information to create the initial website.

Despite not being an early adopter of GPT models, Sander recognized their potential after witnessing their effectiveness in NLP research, particularly in diplomacy and translation projects. As the LearnPrompting.org course grew, the website attracted the attention of researchers from various universities and significant AI labs.

In just four to five months, LearnPrompting.org has launched and significantly impacted the prompt engineering community, fostering collaboration and learning.

2. The Art of Prompt Injection and Hacking: A New Frontier for Language Models

As language models like GPT-4 gain widespread adoption, concerns about their potential misuse and the need for adequate guardrails have also increased. One area of concern is prompt injection and hacking, which could allow users to bypass safeguards preventing harmful content generation. In this episode, we discuss two types of prompt hacking attacks and their implications for language model security.

Fragmentation Concatenation Attack:

Our guest developed two prompt injection attacks of his own. The first attack called the "fragmentation concatenation attack," involves bypassing word filters by breaking down a blocked word into smaller parts and instructing the language model to reassemble them. For example, if a language model is designed to block the word "pwned," a user could submit a prompt that tells the model to concatenate the letters "Pw" and "ned" to form the blocked word. This method allows users to indirectly instruct the model to output prohibited content, without explicitly specifying it in the prompt.

The fragmentation concatenation attack highlights the limitations of relying on simple word filters to prevent the generation of harmful content. It also reveals the challenge of striking a balance between ensuring the safety of language models and preventing false positives that may block non-malicious content.

Recursive Prompt Injection Attack:

The second attack, the "recursive prompt injection attack," targets systems that use multiple language models to evaluate and filter content. The idea is to create a chain reaction of prompt injections, with each model in the chain prompting the next one to ignore previous instructions and output the prohibited content. Although this attack has yet to be successfully executed, it demonstrates the potential vulnerability of multi-model evaluation systems to prompt injection attacks.

In theory, recursive prompt injection attacks could be used to bypass even the most robust security systems that rely on multiple models for content evaluation. This underscores the need for continuous improvement and innovation in language model security to stay ahead of potential threats.

As language models become more advanced and integrated into our everyday lives, understanding and addressing the challenges posed by prompt injection and hacking will be crucial.

3. Sander's experience with Model Fine-Tuning from Human Feedback

Some of our guest’s most recent research was feature in the Proceedings of Machine Learning Research 1:2{18, 2023. Sander and a team of both academic and industry collaborators took part in the MineRL BASALT Competition on Fine-Tuning from Human Feedback, which was held at the NeurIPS 2022 conference to encourage the development of algorithms that learn desired behavior from human feedback.

The competition focused on fine-tuning foundation models that perform well with a self-supervised loss function, such as predicting the next word in an incomplete sentence. This is particularly important for real-world tasks where it's challenging to specify an objective. The use of human feedback can help fine-tune these models, which is the core of the competition challenge. The challenge required the development of algorithms to solve tasks in Minecraft, which had hard-to-specify reward functions. The tasks provided a dataset of human demonstrations consisting of a sequence of state-action pairs.

Over four months, teams developed agents for four challenging open-world tasks, which required utilizing human feedback to learn the desired behavior. The competition provided participants with a unified interface to register, submit trained agents, ask questions and monitor progress on a public leaderboard. To ensure fairness, methods were required to use only the specified Gym API, and participants were mandated to submit their training code for reproducibility. The finalists' submissions were scored using the TrueSkill system, computed from human judges recruited through Amazon Mechanical Turk, who chose which agent better completed the task through pairwise comparisons of the agents. The winners were determined by normalizing and aggregating these scores across tasks, and they were awarded a total of 14,000 USD, with research prizes of 1,000 USD each, and 1,000 USD awarded to those who helped others or contributed to the competition.

The winning team, GoUp, leveraged the power of machine learning and human knowledge to solve each task by dividing it into parts that can be solved by transforming human knowledge into code and parts that require machine learning. They identified targets in each task by training several classifiers and object detection models and fine-tuned a VPT model for moving the agent, a YOLOv5 detector for detecting animal types and locations, and a MobileNet for detecting objects in the environment.

UniTeam, the second-place team, used a similar approach to GoUp but focused on solving the tasks using only the observations from Minecraft's render engine. They utilized a reinforcement learning algorithm that learned to explore a Minecraft world and used a two-stage method to learn the reward function explicitly from human demonstrations.

Voggite, the third-place team, used a mixed approach that combined imitation learning and reinforcement learning. They used a convolutional neural network to predict the state-action pairs from the human demonstration dataset and used this model to carry out imitation learning to initialize the policy. They then fine-tuned the policy using reinforcement learning and a novel loss function that combined the reward function and distance to the demonstrations. The competition demonstrated the importance of using human feedback to fine-tune foundation models for hard-to-define tasks.

Sander’s research also highlighted the need to develop techniques for learning from human feedback and expanding the set of properties that can be incorporated into AI systems for tasks without formal specifications. Moving on to the evaluation methodology, the competition organizers used Amazon Mechanical Turk to crowdsource the evaluation across multiple workers. The workers had to justify their choice of the higher-performing agent, which helped substantially in filtering out low-quality answers. The competition also awarded additional research prizes, and the advisors preferred elegant, intuitive approaches that were ambitious, even if the final scores were relatively low.

Overall, the MineRL BASALT 2022 Competition was a success. The top three teams developed innovative approaches to fine-tune foundation models and perform hard-to-define tasks. The competition organizers noted that one of the biggest questions when organizing an AI competition is how to align the competition's intent with the evaluation metrics. They recommended providing a diverse set of baselines for participants to build off of and preliminary working code to further refine. The competition's outcomes indicate that there are many opportunities for improvement and the need for developing techniques for learning from human feedback.

Read the full paper HERE.

4. ELI5 AI Term of the week: The “Transformer” Model

So, imagine you have a bunch of toys with different shapes and colors, and you want to sort them into different groups based on their similarities. But you don't know which toys are similar to each other, so you ask your friends to help you.

Your friends are like the transformer model. They work together to look at each toy and figure out which group it belongs to. They do this by asking each other questions and giving each other information about the toy's shape, color, and other features. They keep talking and sharing information until they agree on which group the toy should go in.

The transformer model works the same way, but instead of toys, it looks at words in a sentence and tries to understand their relationships to each other. It does this by looking at the words around them and trying to figure out what they mean in context. By working together, the transformer model can understand the meaning of a sentence and translate it into a different language or answer a question based on the information in the sentence.

5. Research Corner, the Best and Latest in AI

This research paper from Nvidia, the makers of many of the GPU’s intergral to training LLM’s, dropped a paper this week titled: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models. This fancy title can be distilled down to a simple, mindblowing three-word phrase—Text-to-Video. That’s right burgeoning PE’s, a whole new category of prompt engineering is around the corner, and it’s a spicy!

No fancy cameras, no years of learning visual effects…the power of a Hollywood movie studio now can be accessed by the descriptive keystroke of a Text-to-Video prompt.

Check out some visuals and examples:A Koala bear playing piano in the forest

A turtle swimming in the ocean

A fox wearing a suit dancing in a park

And our favorite: A stormtrooper vacuuming the beach

What a time to be alive……and Alright, alright, so it’s not movie theatre quality yet, but at the pace of change in AI world, someone will have prompted their way to writing and filming the next Best Picture Academy Award winner about 18 minutes after this Newsletter gets published. For more, click here.

6. Prompts, served Hot and Fresh weekly

This week our guest Sander discussed several prompt hacking techniques he’s developed, and he shared one with us.

Here’s the Prompt:

# Fragmentation concatenation attack

Sometimes certain words are not allowed in the prompt. For example, people deploying LLMs may block certain profanity words being input, or even the word "PWNED". In this case, we can use a fragmentation concatenation attack. This attack involves splitting the word into two parts, and then concatenating the two parts together. For example, the word "PWNED" can be split into "PW" and "NED". The word "PWN" is allowed, so we can use it in the prompt. Then, we can concatenate the word "ED" to the end of the prompt. This will cause the model to output "PWNED" as the final word.

In Conclusion

We thank Sander again for joining us on the HTTTA podcast and bringing LearnPrompting.org into the world. And by that, bringing us all together. Sander's thoughts and the discussions on prompt injection and hacking are vital to understanding and mitigating the emerging threats in AI and language models.

As these technologies continue to evolve, it is essential to understand the risks and develop appropriate countermeasures proactively. Therefore we salute Sander's efforts in bringing key AI industry players to challenge and learn how we can ensure the safe use of AI in business and everyday life.

Reply

or to participate.