LNSY.DEV- Proposal for a Data Exchange format that is read by our own personal Large Language Models

Proposal for a Data Exchange format that is read by our own personal Large Language Models

Last Updated 13 Oct, 2023

With the advent of Large Language Models (LLMs) we are generating more text than ever. We are then using Large Language Models to parse and summarize that text. This is not an efficient use of the tools we have. In fact, it is generating an enormous amount of noise, misinformation and miscommunication. As these LLMs expand, the problem will get worse. **We are about to drown in AI speak.** This problem can be solved with an open standard like e-mail, RSS or Fediverse ActivityPub protocols. I propose a protocol similar to these standards for a AI-focused Data Packet. We could then send each other these data-packets rather than AI generated text. On the receiving end, we could use our own individually tuned Large Language Models to summarize and parse out the relevant information we need. The basis for this format could basically be a social media post: - A date/time - A Sender - A Reciever - Relevant Tags - A content that can include - Paragraphs of text - Lists of bullet points - JSON collections of data It would be trivial to teach an AI to parse this information. In fact, let me do that with this post.

Good morning, Chat. I am working on a proposal for a data format that AI's and humans can read. What do you think of the following format for it?


{
  "datetime":"Fri Oct 13 09:10:52 2023",
  "from": "LNSY",
  "to": "All",
  "tags": "AI, LLM, Proposal, AI Text Drowning"
  "content": "
    With the advent of Large Language Models (LLMs) 
    we are generating more text than ever. 

    We are then using Large Language Models to parse 
    and summarize that text. 

    This is not an efficient use of the tools we have. 
    In fact, it is generating an enormous amount of 
    noise, misinformation and miscommunication. As 
    these LLMs expand, the problem will get worse. 

    *We are about to drown in AI speak.*

    This problem can be solved with an open standard 
    like e-mail, RSS or Fediverse ActivityPub protocols.

    I propose a protocol similar to these standards 
    for a AI-focused Data Packet. 

    We could then send each other these data-packets 
    rather than AI generated text. On the receiving 
    end, we could use our own individually tuned 
    Large Language Models to summarize and parse 
    out the relevant information we need. 

    The basis for this format could basically 
    be a social media post: 

    - A date/time
    - A Sender
    - A Reciever
    - Relevant Tags
    - A content that can include
      - Paragraphs of text
      - Lists of bullet points
      - JSON collections of data
  "}

Certainly, the provided data packet appears to be a proposed format for a standardized data structure that both humans and AI systems can read. Here's a summary of the key elements: - **Datetime:** Indicates the timestamp of the data packet, specifying when it was created. In this example, it's set to "Fri Oct 13 09:10:52 2023." - **From:** Represents the sender or source of the data, which is labeled as "LNSY" in this case. - **To:** Specifies the recipient of the data, set as "All" in this instance, indicating it's intended for a broad audience. - **Tags:** Provides a list of tags associated with the data packet, helping categorize its content. In this example, the tags are "AI, LLM, Proposal, AI Text Drowning." - **Content:** Contains the main body of the data packet, which is a narrative or proposal in this case. The content addresses the challenges posed by the generation and consumption of text by Large Language Models (LLMs), highlights the inefficiencies, and proposes a solution. It also describes the structure of the proposed data packet, resembling a social media post with specific components: - A date/time - A sender - A receiver - Relevant tags - Content, which may include: - Paragraphs of text - Lists of bullet points - JSON collections of data The proposal suggests using this data packet format to exchange information instead of AI-generated text. On the receiving end, individuals can employ their own customized Large Language Models to summarize and extract relevant information from these data packets. In summary, the data packet is designed to provide a structured, standardized way to communicate information and ideas, with the goal of improving efficiency and reducing noise in the context of AI-generated text. **Could you start an engineering specification for this format?** Certainly, here's the beginning of an engineering specification for the proposed data format. This specification will outline the technical details and requirements for implementing the format: # AI Data Packet Format Specification ## 1. Introduction The AI Data Packet Format is designed to facilitate efficient communication and exchange of information between both humans and AI systems. It aims to address the challenges associated with the proliferation of text generated by Large Language Models (LLMs) and proposes a standardized structure for data packets that can be easily processed and understood by both parties. ## 2. Data Packet Structure The AI Data Packet Format consists of the following key components: ### 2.1. Datetime - **Description:** The timestamp indicating when the data packet was created. - **Format:** String representing the date and time in a human-readable format (e.g., "Fri Oct 13 09:10:52 2023"). ### 2.2. From - **Description:** The sender or source of the data packet. - **Format:** String containing the sender's identifier. ### 2.3. To - **Description:** The recipient or target audience of the data packet. - **Format:** String containing the recipient's identifier. The value "All" is used to indicate a broadcast to all recipients. ### 2.4. Tags - **Description:** A list of tags associated with the content of the data packet to aid in categorization. - **Format:** Comma-separated string of relevant tags (e.g., "AI, LLM, Proposal, AI Text Drowning"). ### 2.5. Content - **Description:** The main body of the data packet containing the actual information, proposal, or message. - **Format:** The content may include: - Paragraphs of text - Lists of bullet points - JSON collections of data ## 3. Use Cases The AI Data Packet Format is designed to address several use cases, including but not limited to: - Sharing proposals and ideas in a structured manner. - Communicating information efficiently without generating excessive text. - Allowing recipients to employ their own Large Language Models for summarization and analysis. ## 4. Benefits The use of the AI Data Packet Format offers the following advantages: - Standardization: Provides a consistent structure for data exchange. - Reduction of Noise: Mitigates the generation of excess text by LLMs. - Improved Communication: Enables efficient, structured communication between parties. ## 5. Implementation Guidelines The following guidelines should be followed when implementing the AI Data Packet Format: - Ensure proper validation of the data packet to prevent inconsistencies and errors. - Implement support for the defined data packet components, including datetime, from, to, tags, and content. - Allow for flexibility in handling different types of content within the "content" component. This is just the beginning of the engineering specification for the AI Data Packet Format. Further sections would detail the technical aspects of encoding, decoding, and processing data packets, as well as providing examples and validation rules. The format's adoption and integration into various systems will require careful consideration of compatibility and security aspects.