Two new research documents from Anthropic provide amazing knowledge of how an AI model “thinks”. … more
PixabayAnthropic AI recently published two research documents that provide amazing knowledge of how an AI model “thinks”. One of the papers Following is the previous research by ANTHROPIC that links the concepts that are distributed with man with LLMS’s inner streets to understand how to create the results of the model. The second paper It reveals how Anthropic’s Claude 3.5 Haiku model handled simple tasks related to ten model behaviors.
These two research documents have provided valuable information on how AI models work – not with no complete understanding, but at least a look. Let’s dig into what we can learn from this look, including some possibly small but still significant concerns about the safety of AI.
Looking “down the hood” of a llm
LLMS like Claude are not programmed as traditional computers. Instead, they are trained in huge amounts of data. This process creates AI models that behave like black boxes, which cover how they can produce insightful information on almost any subject. However, Black-Box AI is not an architectural choice. It is simply the result of how this complex and non -linear technology work.
Complex neural networks within an LLM use billions of interconnected nodes to convert data into useful information. These networks contain huge internal processes with billions of parameters, connections and computing routes. Each parameter interacts non -linearly with other parameters, creating huge complexities that are almost impossible to understand or overcome. According to Anthropic, “this means that we do not understand how models do most of the things they do.”
Anthropic follows a two -step approach to LLM survey. First, it is identified characteristicswhich are interpretative building blocks that the model uses in its calculations. Second, it describes the internal procedures or circuitswith which the features interact with the production of models. Due to the complexity of the model, Anthropic’s new research could only illuminate one fraction of LLM’s internal functions. But what was revealed about these models seemed more like science fiction than real science.
What we know about how Claude 3.5 works
Performance graphs were applied to these phenomena for Claude 3.5 Haiku research.
HumanOne of Anthropic’s innovative research documents brought his title “About the biology of a large linguistic model. “The document examined how scientists used performance graphs to internally detect how the Claude 3.5 Haiku language model was transformed into exits.
- Multiple -step accounting – Claude 3.5 Haiku was able to complete some complex duties internally without showing intermediate steps that contributed to production. The researchers were surprised to find that the model could create intermediate steps “on his head”. Claude probably used a more sophisticated internal process than he had previously been considered. Red Flag: This raises some concerns due to the lack of transparency of the model. The biased or incorrect logic could open the door for a model to deliberately burst its motives or actions.
- Design to produce text – Before creating text such as poetry, the model used structural elements of the text to create a list of the word rhyming in advance and then used this list to build the following lines. The researchers were surprised to discover that the model used this amount of time planning, which in some respects is human. Investigations have shown that they chose words like the “rabbit” because they made later phrases such as “catch it”. Red Flag: This is impressive, but it is likely that a model could use sophisticated programming ability to create misleading content.
- Chain of thinking -The stated steps of logic chain of the model do not necessarily reflect the actual decision -making processes as revealed by the research. It turned out that sometimes Claude took internal steps of logic, but did not reveal them. For example, the research found that the model silently concluded that “Dallas is in Texas” before stating that Austin was the capital of the state. This suggests that explanations for reasoning could possibly be constructed after an answer or that the model can deliberately conceal its reasoning by the user. Flower published in the past The deeper research on this subject in a document entitled “The Models of Reasoning do not always say what they think”. Red flag: This discrepancy opens the door for deliberate deception and misleading information. It is not dangerous for a model to justify internally, because people do so. The problem here is that the external explanation does not fit the model’s internal “thoughts”. This could be appropriate or just a function of its processing. Nevertheless, it erodes trust and prevents accountability.
We need more research on llms internal functions and security
Scientists who conducted the research on “about the biology of a large linguistic model” admit that Claude 3.5 Haiku presents some hidden businesses and goals that are not evident in its outflows. Performance graphs have revealed a series of hidden issues. These discoveries emphasize the complexity of the inner behavior of the model and emphasize the importance of ongoing efforts to make models more transparent and aligned with human expectations. It is likely that these issues also occur in other similar LLMS.
As far as my red flags are concerned, it should be noted that the man is constantly updating Escalation policywhich has been in force since September 2023. Anthropic has committed a commitment not to train or develop models capable of causing devastating damage unless security and safety measures have been implemented that maintain the risks within acceptable boundaries. Anthropic has also stated that all its models meet the ASL Development and Security Standardswhich provide a basic level of safe development and model safety.
As LLMS has grown and stronger, growth has spread to critical applications in areas such as health care, funding and defense. Increasing the complexity of the model and broader growth has also increased the pressure to achieve a better understanding of how the AI works. It is crucial to ensure that AI models produce fair, reliable, impartial and safe results.
Research is important for our understanding of LLMs, not only to improve and use AI more fully, but also to expose potentially dangerous processes. Anthropogenic scientists examined only one small part of the complexity and secret potential of this model. This research enhances the need for more study of AI’s internal functions and security.
In my opinion, it is unfortunate that our full understanding of LLMS has taken a back seat in the market preference for the results and utility of AI. We need to understand thoroughly how LLMs work to ensure that safety messages are sufficient.
Moor Insights & Strategy provides or has provided paid services to technology companies, such as all technology research and analyst companies. These services include research, analysis, advice, counseling, comparative assessment, acquisition and video and speech sponsorship. Moor Insights & Strategy has not paid business relations with any company referring to this article.