| Class | CodeRay::Tokens |
| In: |
lib/coderay/tokens.rb
lib/coderay/token_classes.rb |
| Parent: | Array |
The Tokens class represents a list of tokens returnd from a Scanner.
A token is not a special object, just a two-element Array consisting of
A token looks like this:
['# It looks like this', :comment] ['3.1415926', :float] ['$^', :error]
Some scanners also yield sub-tokens, represented by special token actions, namely :open and :close.
The Ruby scanner, for example, splits "a string" into:
[ [:open, :string], ['"', :delimiter], ['a string', :content], ['"', :delimiter], [:close, :string] ]
Tokens is the interface between Scanners and Encoders: The input is split and saved into a Tokens object. The Encoder then builds the output from this object.
Thus, the syntax below becomes clear:
CodeRay.scan('price = 2.59', :ruby).html
# the Tokens object is here -------^
See how small it is? ;)
Tokens gives you the power to handle pre-scanned code very easily: You can convert it to a webpage, a YAML file, or dump it into a gzip‘ed string that you put in your DB.
It also allows you to generate tokens directly (without using a scanner), to load them from a file, and still use any Encoder that CodeRay provides.
Tokens’ subclass TokenStream allows streaming to save memory.
| ClassOfKind | = | Hash.new do |h, k| h[k] = k.to_s |
| scanner | [RW] | The Scanner instance that created the tokens. |
Undump the object using Marshal.load, then unzip it using GZip.gunzip.
The result is commonly a Tokens object, but this is not guaranteed.
# File lib/coderay/tokens.rb, line 267
267: def Tokens.load dump
268: require 'coderay/helpers/gzip_simple'
269: dump = dump.gunzip
270: @dump = Marshal.load dump
271: end
Dumps the object into a String that can be saved in files or databases.
The dump is created with Marshal.dump; In addition, it is gzipped using GZip.gzip.
The returned String object includes Undumping so it has an undump method. See Tokens.load.
You can configure the level of compression, but the default value 7 should be what you want in most cases as it is a good compromise between speed and compression rate.
See GZip module.
# File lib/coderay/tokens.rb, line 228
228: def dump gzip_level = 7
229: require 'coderay/helpers/gzip_simple'
230: dump = Marshal.dump self
231: dump = dump.gzip gzip_level
232: dump.extend Undumping
233: end
Iterates over all tokens.
If a filter is given, only tokens of that kind are yielded.
# File lib/coderay/tokens.rb, line 67
67: def each kind_filter = nil, &block
68: unless kind_filter
69: super(&block)
70: else
71: super() do |text, kind|
72: next unless kind == kind_filter
73: yield text, kind
74: end
75: end
76: end
Iterates over all text tokens. Range tokens like [:open, :string] are left out.
Example:
tokens.each_text_token { |text, kind| text.replace html_escape(text) }
# File lib/coderay/tokens.rb, line 83
83: def each_text_token
84: each do |text, kind|
85: next unless text.is_a? ::String
86: yield text, kind
87: end
88: end
Encode the tokens using encoder.
encoder can be
options are passed to the encoder.
# File lib/coderay/tokens.rb, line 98
98: def encode encoder, options = {}
99: unless encoder.is_a? Encoders::Encoder
100: unless encoder.is_a? Class
101: encoder_class = Encoders[encoder]
102: end
103: encoder = encoder_class.new options
104: end
105: encoder.encode_tokens self, options
106: end
Ensure that all :open tokens have a correspondent :close one.
TODO: Test this!
# File lib/coderay/tokens.rb, line 165
165: def fix
166: tokens = self.class.new
167: # Check token nesting using a stack of kinds.
168: opened = []
169: for type, kind in self
170: case type
171: when :open
172: opened.push [:close, kind]
173: when :begin_line
174: opened.push [:end_line, kind]
175: when :close, :end_line
176: expected = opened.pop
177: if [type, kind] != expected
178: # Unexpected :close; decide what to do based on the kind:
179: # - token was never opened: delete the :close (just skip it)
180: next unless opened.rindex expected
181: # - token was opened earlier: also close tokens in between
182: tokens << token until (token = opened.pop) == expected
183: end
184: end
185: tokens << [type, kind]
186: end
187: # Close remaining opened tokens
188: tokens << token while token = opened.pop
189: tokens
190: end
Redirects unknown methods to encoder calls.
For example, if you call +tokens.html+, the HTML encoder is used to highlight the tokens.
# File lib/coderay/tokens.rb, line 120
120: def method_missing meth, options = {}
121: Encoders[meth].new(options).encode_tokens self
122: end
Returns the tokens compressed by joining consecutive tokens of the same kind.
This can not be undone, but should yield the same output in most Encoders. It basically makes the output smaller.
Combined with dump, it saves space for the cost of time.
If the scanner is written carefully, this is not required - for example, consecutive //-comment lines could already be joined in one comment token by the Scanner.
# File lib/coderay/tokens.rb, line 135
135: def optimize
136: last_kind = last_text = nil
137: new = self.class.new
138: for text, kind in self
139: if text.is_a? String
140: if kind == last_kind
141: last_text << text
142: else
143: new << [last_text, last_kind] if last_kind
144: last_text = text
145: last_kind = kind
146: end
147: else
148: new << [last_text, last_kind] if last_kind
149: last_kind = last_text = nil
150: new << [text, kind]
151: end
152: end
153: new << [last_text, last_kind] if last_kind
154: new
155: end
TODO: Scanner#split_into_lines
Makes sure that:
This makes it simple for encoders that work line-oriented, like HTML with list-style numeration.
# File lib/coderay/tokens.rb, line 205
205: def split_into_lines
206: raise NotImplementedError
207: end
# File lib/coderay/tokens.rb, line 209
209: def split_into_lines!
210: replace split_into_lines
211: end
Whether the object is a TokenStream.
Returns false.
# File lib/coderay/tokens.rb, line 60
60: def stream?
61: false
62: end
The total size of the tokens. Should be equal to the input size before scanning.
# File lib/coderay/tokens.rb, line 238
238: def text_size
239: size = 0
240: each_text_token do |t, k|
241: size + t.size
242: end
243: size
244: end
Turn into a string using Encoders::Text.
options are passed to the encoder if given.
# File lib/coderay/tokens.rb, line 112
112: def to_s options = {}
113: encode :text, options
114: end