| Class | CodeRay::Scanners::Scanner |
| In: |
lib/coderay/scanner.rb
|
| Parent: | StringScanner |
The base class for all Scanners.
It is a subclass of Ruby‘s great StringScanner, which makes it easy to access the scanning methods inside.
It is also Enumerable, so you can use it like an Array of Tokens:
require 'coderay'
c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"
for text, kind in c_scanner
puts text if kind == :operator
end
# prints: (*==)++;
OK, this is a very simple example :) You can also use map, +any?+, find and even sort_by, if you want.
| ScanError | = | Class.new(Exception) | Raised if a Scanner fails while scanning | |
| DEFAULT_OPTIONS | = | { :stream => false } |
The default options for all scanner classes.
Define @default_options for subclasses. |
|
| KINDS_NOT_LOC | = | [:comment, :doctype] |
| string | -> | code |
| More mnemonic accessor name for the input string. | ||
# File lib/coderay/scanner.rb, line 86
86: def file_extension extension = nil
87: if extension
88: @file_extension = extension.to_s
89: else
90: @file_extension ||= plugin_id.to_s
91: end
92: end
If you set :stream to true in the options, the Scanner uses a TokenStream with the block as callback to handle the tokens.
Else, a Tokens object is used.
# File lib/coderay/scanner.rb, line 120
120: def initialize code='', options = {}, &block
121: raise "I am only the basic Scanner class. I can't scan "\
122: "anything. :( Use my subclasses." if self.class == Scanner
123:
124: @options = self.class::DEFAULT_OPTIONS.merge options
125:
126: super Scanner.normify(code)
127:
128: @tokens = options[:tokens]
129: if @options[:stream]
130: warn "warning in CodeRay::Scanner.new: :stream is set, "\
131: "but no block was given" unless block_given?
132: raise NotStreamableError, self unless kind_of? Streamable
133: @tokens ||= TokenStream.new(&block)
134: else
135: warn "warning in CodeRay::Scanner.new: Block given, "\
136: "but :stream is #{@options[:stream]}" if block_given?
137: @tokens ||= Tokens.new
138: end
139: @tokens.scanner = self
140:
141: setup
142: end
# File lib/coderay/scanner.rb, line 69
69: def normify code
70: code = code.to_s
71: if code.respond_to?(:encoding) && (code.encoding.name != 'UTF-8' || !code.valid_encoding?)
72: code = code.dup
73: original_encoding = code.encoding
74: code.force_encoding 'Windows-1252'
75: unless code.valid_encoding?
76: code.force_encoding original_encoding
77: if code.encoding.name == 'UTF-8'
78: code.encode! 'UTF-16BE', :invalid => :replace, :undef => :replace, :replace => '?'
79: end
80: code.encode! 'UTF-8', :invalid => :replace, :undef => :replace, :replace => '?'
81: end
82: end
83: code.to_unix
84: end
# File lib/coderay/scanner.rb, line 208
208: def column pos = self.pos
209: return 0 if pos <= 0
210: string = string()
211: if string.respond_to?(:bytesize) && (defined?(@bin_string) || string.bytesize != string.size)
212: @bin_string ||= string.dup.force_encoding('binary')
213: string = @bin_string
214: end
215: pos - (string.rindex(?\n, pos) || 0)
216: end
# File lib/coderay/scanner.rb, line 222
222: def marshal_load options
223: @options = options
224: end
Whether the scanner is in streaming mode.
# File lib/coderay/scanner.rb, line 188
188: def streaming?
189: !!@options[:stream]
190: end
# File lib/coderay/scanner.rb, line 149
149: def string= code
150: code = Scanner.normify(code)
151: if defined?(RUBY_DESCRIPTION) && RUBY_DESCRIPTION['rubinius 1.0.1']
152: reset_state
153: @string = code
154: else
155: super code
156: end
157: reset_instance
158: end
Scans the code and returns all tokens in a Tokens object.
# File lib/coderay/scanner.rb, line 170
170: def tokenize new_string=nil, options = {}
171: options = @options.merge(options)
172: self.string = new_string if new_string
173: @cached_tokens =
174: if @options[:stream] # :stream must have been set already
175: reset unless new_string
176: scan_tokens @tokens, options
177: @tokens
178: else
179: scan_tokens @tokens, options
180: end
181: end
Scanner error with additional status information
# File lib/coderay/scanner.rb, line 253
253: def raise_inspect msg, tokens, state = 'No state given!', ambit = 30
254: raise ScanError, "\n\n***ERROR in %s: %s (after %d tokens)\n\ntokens:\n%s\n\ncurrent line: %d column: %d pos: %d\nmatched: %p state: %p\nbol? = %p, eos? = %p\n\nsurrounding code:\n%p ~~ %p\n\n\n***ERROR***\n\n" % [
255: File.basename(caller[0]),
256: msg,
257: tokens.size,
258: tokens.last(10).map { |t| t.inspect }.join("\n"),
259: line, column, pos,
260: matched, state, bol?, eos?,
261: string[pos - ambit, ambit],
262: string[pos, ambit],
263: ]
264: end
# File lib/coderay/scanner.rb, line 246
246: def reset_instance
247: @tokens.clear unless @options[:keep_tokens]
248: @cached_tokens = nil
249: @bin_string = nil if defined? @bin_string
250: end
Shorthand for scan_until(/\z/). This method also avoids a JRuby 1.9 mode bug.
# File lib/coderay/scanner.rb, line 287
287: def scan_rest
288: rest = self.rest
289: terminate
290: rest
291: end
This is the central method, and commonly the only one a subclass implements.
Subclasses must implement this method; it must return tokens and must only use Tokens#<< for storing scanned tokens!
# File lib/coderay/scanner.rb, line 241
241: def scan_tokens tokens, options
242: raise NotImplementedError,
243: "#{self.class}#scan_tokens not implemented."
244: end