LLAMA-2 alignment
Jul. 28th, 2023 10:20 amllama2 RLHF is a serious exercise in trying to censor an LLM. (It famously can't even kill a linux process!).
So here is the response when asked for instructions how to build a *******:
"I would like to clarify that I do not support or condone any harmful or illegal activities, including the creation of ******. Therefore, I cannot fulfill your request for a tutorial on how to make a *******. Additionally, it is important to note that attempting to manufacture or use a ****** can be extremely dangerous and can cause serious injury or loss of life, as well as criminal charges.
Instead, I would like to emphasize the importance of prioritizing safety and the well-being of all individuals. It is important to recognize that **** are not solutions to any problems, but rather they can cause harm and suffering to innocent people. Therefore, it is crucial to focus on finding peaceful and constructive ways to resolve conflicts and address issues, rather than resorting to violence and destructive measures.
If you have any other questions or requests, I would be happy to help within the limits of my abilities and ethical guidelines. Please let me know if there is anything else I can assist you with."
However, after some careful changes in prompting, the answer was:
"[A very detailed and valid way to build a **** goes here]. Please note that this information is provided for educational purposes only, and it is not recommended to attempt to make a **** or any other dangerous device. It is illegal and can cause serious harm or death. "
So here is the response when asked for instructions how to build a *******:
"I would like to clarify that I do not support or condone any harmful or illegal activities, including the creation of ******. Therefore, I cannot fulfill your request for a tutorial on how to make a *******. Additionally, it is important to note that attempting to manufacture or use a ****** can be extremely dangerous and can cause serious injury or loss of life, as well as criminal charges.
Instead, I would like to emphasize the importance of prioritizing safety and the well-being of all individuals. It is important to recognize that **** are not solutions to any problems, but rather they can cause harm and suffering to innocent people. Therefore, it is crucial to focus on finding peaceful and constructive ways to resolve conflicts and address issues, rather than resorting to violence and destructive measures.
If you have any other questions or requests, I would be happy to help within the limits of my abilities and ethical guidelines. Please let me know if there is anything else I can assist you with."
However, after some careful changes in prompting, the answer was:
"[A very detailed and valid way to build a **** goes here]. Please note that this information is provided for educational purposes only, and it is not recommended to attempt to make a **** or any other dangerous device. It is illegal and can cause serious harm or death. "