DEVELOPER.md (6850B)
1 # Tokenator Developer Documentation 2 3 This document provides detailed information for developers who want to use the Tokenator library in their projects or contribute to its development. 4 5 ## Core Concepts 6 7 Tokenator works with two primary concepts: 8 9 1. **Token Parsing**: Converting a sequence of string tokens into structured data 10 2. **Token Serialization**: Converting structured data into a sequence of string tokens 11 12 The library is designed to be simple, efficient, and flexible for working with delimited string formats. 13 14 ## API Reference 15 16 ### TokenParser 17 18 `TokenParser` is responsible for parsing tokens from a slice of string references. 19 20 ```rust 21 pub struct TokenParser<'a> { 22 tokens: &'a [&'a str], 23 index: usize, 24 } 25 ``` 26 27 Key methods: 28 29 - `new(tokens: &'a [&'a str]) -> Self`: Creates a new parser from a slice of string tokens 30 - `pull_token() -> Result<&'a str, ParseError<'a>>`: Gets the next token and advances the index 31 - `peek_token() -> Result<&'a str, ParseError<'a>>`: Looks at the next token without advancing the index 32 - `parse_token(expected: &'static str) -> Result<&'a str, ParseError<'a>>`: Checks if the next token matches the expected value 33 - `alt<R>(parser: &mut TokenParser<'a>, routes: &[fn(&mut TokenParser<'a>) -> Result<R, ParseError<'a>>]) -> Result<R, ParseError<'a>>`: Tries each parser in `routes` until one succeeds 34 - `parse_all<R>(&mut self, parse_fn: impl FnOnce(&mut Self) -> Result<R, ParseError<'a>>) -> Result<R, ParseError<'a>>`: Ensures all tokens are consumed after parsing 35 - `try_parse<R>(&mut self, parse_fn: impl FnOnce(&mut Self) -> Result<R, ParseError<'a>>) -> Result<R, ParseError<'a>>`: Attempts to parse and backtracks on failure 36 - `is_eof() -> bool`: Checks if there are any tokens left to parse 37 38 ### TokenWriter 39 40 `TokenWriter` is responsible for serializing tokens into a string with the specified delimiter. 41 42 ```rust 43 pub struct TokenWriter { 44 delim: &'static str, 45 tokens_written: usize, 46 buf: Vec<u8>, 47 } 48 ``` 49 50 Key methods: 51 52 - `new(delim: &'static str) -> Self`: Creates a new writer with the specified delimiter 53 - `default() -> Self`: Creates a new writer with ":" as the delimiter 54 - `write_token(token: &str)`: Appends a token to the buffer 55 - `str() -> &str`: Gets the current buffer as a string 56 - `buffer() -> &[u8]`: Gets the current buffer as a byte slice 57 58 ### TokenSerializable 59 60 `TokenSerializable` is a trait that types can implement to be serialized to and parsed from tokens. 61 62 ```rust 63 pub trait TokenSerializable: Sized { 64 fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, ParseError<'a>>; 65 fn serialize_tokens(&self, writer: &mut TokenWriter); 66 } 67 ``` 68 69 ### Error Handling 70 71 The library provides detailed error types: 72 73 - `ParseError<'a>`: Represents errors that can occur during parsing 74 - `Incomplete`: Not done parsing yet 75 - `AltAllFailed`: All parsing options failed 76 - `DecodeFailed`: General decoding failure 77 - `HexDecodeFailed`: Hex decoding failure 78 - `UnexpectedToken`: Encountered an unexpected token 79 - `EOF`: No more tokens 80 81 ## Advanced Usage 82 83 ### Backtracking and Alternative Parsing 84 85 One of the powerful features of Tokenator is its support for backtracking and alternative parsing paths: 86 87 ```rust 88 // Try multiple parsing strategies 89 let result = TokenParser::alt(&mut parser, &[ 90 |p| parse_strategy_a(p), 91 |p| parse_strategy_b(p), 92 |p| parse_strategy_c(p), 93 ]); 94 95 // Attempt to parse but backtrack on failure 96 let result = parser.try_parse(|p| { 97 let token = p.parse_token("specific_token")?; 98 // More parsing... 99 Ok(result) 100 }); 101 ``` 102 103 ### Parsing Hex Data 104 105 The library includes utilities for parsing hexadecimal data: 106 107 ```rust 108 use tokenator::parse_hex_id; 109 110 // Parse a 32-byte hex string from the next token 111 let hash: [u8; 32] = parse_hex_id(&mut parser)?; 112 ``` 113 114 ### Custom Delimiters 115 116 You can use custom delimiters when serializing tokens: 117 118 ```rust 119 // Create a writer with a custom delimiter 120 let mut writer = TokenWriter::new("|"); 121 writer.write_token("user"); 122 writer.write_token("alice"); 123 // Result: "user|alice" 124 ``` 125 126 ## Best Practices 127 128 1. **Implement TokenSerializable for your types**: This ensures consistency between parsing and serialization logic. 129 130 2. **Use try_parse for speculative parsing**: When trying different parsing strategies, wrap them in `try_parse` to ensure proper backtracking. 131 132 3. **Handle all error cases**: The detailed error types provided by Tokenator help identify and handle specific parsing issues. 133 134 4. **Consider memory efficiency**: The parser works with string references to avoid unnecessary copying. 135 136 5. **Validate input**: Always validate input tokens before attempting to parse them into your data structures. 137 138 ## Integration Examples 139 140 ### Custom Protocol Parser 141 142 ```rust 143 use tokenator::{TokenParser, TokenWriter, TokenSerializable, ParseError}; 144 145 enum Command { 146 Get { key: String }, 147 Set { key: String, value: String }, 148 Delete { key: String }, 149 } 150 151 impl TokenSerializable for Command { 152 fn parse_from_tokens<'a>(parser: &mut TokenParser<'a>) -> Result<Self, ParseError<'a>> { 153 let cmd = parser.pull_token()?; 154 155 match cmd { 156 "GET" => { 157 let key = parser.pull_token()?.to_string(); 158 Ok(Command::Get { key }) 159 }, 160 "SET" => { 161 let key = parser.pull_token()?.to_string(); 162 let value = parser.pull_token()?.to_string(); 163 Ok(Command::Set { key, value }) 164 }, 165 "DEL" => { 166 let key = parser.pull_token()?.to_string(); 167 Ok(Command::Delete { key }) 168 }, 169 _ => Err(ParseError::UnexpectedToken(tokenator::UnexpectedToken { 170 expected: "GET, SET, or DEL", 171 found: cmd, 172 })), 173 } 174 } 175 176 fn serialize_tokens(&self, writer: &mut TokenWriter) { 177 match self { 178 Command::Get { key } => { 179 writer.write_token("GET"); 180 writer.write_token(key); 181 }, 182 Command::Set { key, value } => { 183 writer.write_token("SET"); 184 writer.write_token(key); 185 writer.write_token(value); 186 }, 187 Command::Delete { key } => { 188 writer.write_token("DEL"); 189 writer.write_token(key); 190 }, 191 } 192 } 193 } 194 ``` 195 196 ## Contributing 197 198 Contributions to Tokenator are welcome! Here are some areas that could be improved: 199 200 - Additional parsing utilities 201 - Performance optimizations 202 - More comprehensive test coverage 203 - Example implementations for common use cases 204 - Documentation improvements 205 206 When submitting a pull request, please ensure: 207 208 1. All tests pass 209 2. New functionality includes appropriate tests 210 3. Documentation is updated to reflect changes 211 4. Code follows the existing style conventions