Aurel tells All: Interpreters
I want that this topic is the bigest topic on this forum-
he he...
joke of course Big Grin

Everybody knows (almost) that i don't like python very much
BUT
i simply cannot resist when I see this tokenizer on github (ithink)
it is written in very simple to understand I would like to say in a
BASIC way in python...so look :::


Code:
KEYWORDS = ["CLS","INPUT","PRINT","IF","ELSE","FOR","TO","END","ENDIF","WHILE","WEND","UNTIL","DO","LOOP","THEN"]
SYMBOLS    = [';','=','(',')','+','-','*','\\',',','<','>']
ALPHABETS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz$#'
NUMBERS = '0123456789'
NUMBERS_WITH_DECMALPOINT = NUMBERS +'.'
ALPHANUMBERS = NUMBERS+ALPHABETS
ALPHANUMBERS_WITH_UNDERSCORE = ALPHANUMBERS+'_'

def tokenizer(codeFilePath):
    code = ''
    with open(codeFilePath,'r') as codeFile:
        for row in codeFile:
            code += row
    codeFile.close()
    i=0
    tokens = []
    while i<len(code):
        token,j='',0
        if code[i] in ALPHABETS:
            while i+j<len(code)and code[i+j] in ALPHABETS:
                token += code[i+j]
                j+=1
            if token.upper() in KEYWORDS: tokens += [(token.upper(),"KEYWORD")]
            else: tokens += [(token,"IDENTIFIER")]
            i+=j
        elif code[i] in SYMBOLS:
            tokens += [(code[i],"SYMBOL")]
            i+=1
        elif code[i] in NUMBERS:
            while i+j<len(code)and code[i+j] in NUMBERS_WITH_DECMALPOINT:
                token += code[i+j]
                j+=1
            tokens+= [(token,"NUMBER")]
            i+=j
        elif code[i] == '"':
            token = ''
            i+=1
            j=1
            while i+j<len(code)and code[i+j] != '"':
                token += code[i+j]
                j+=1
            tokens+=[(token,"STRING-LITERAL")]
            i+=j+1
        elif code[i] in [" ","    "]:
            while i+j<len(code)and code[i+j] in [" ","    "]:
                token += code[i+j]
                j+=1
            tokens+=[(token,"WHITESPACE")]
            i+=j
        elif code[i]=="\n":
            tokens+=[(code[i],"NEWLINE")]
            i+=1
        else:
            tokens+=[(code[i],"UNINDENTIFIED")]
            i+=1
    return tokens
Reply
What  I will do is :
I will try to modify pyQB_tokenizer to my ANIscript scanner program and
also modify a right listbox control to show new tokens or token list.
I think that might work very well


Attached Files Thumbnail(s)

Reply
Hello boys..
Finally i get it to work
i have stupid bugs with adding empty string as string ""
I still don't have quoted string but i hope that should work!

Code:
'Tokenizer for basic -like syntax in python
'PyQB tokenizer translation to Oxygen Basic - by Aurel 2018
#lookahead
string KEYWORDS[] = {"CLS","PRINT","IF","ELSE","FOR","TO","NEXT","ENDIF","WHILE","WEND","UNTIL","DO","LOOP","THEN"}
string SYMBOLS = ":=()+-*/<>"
string ALPHABETS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz$#"
string NUMBERS = "0123456789"
string NUMBERS_WITH_DECIMALPOINT = NUMBERS + "."
string ALPHANUMBERS = ALPHABETS + NUMBERS
string ALPHANUMBERS_WITH_UNDERSCORE = ALPHANUMBERS + "_"
string tokens  'token buffer          'tokens[1024] ' token list
string crlf = chr(13)+chr(10)

function tokenizer(string code) as string
    string token, ch
    '
    'load file?
    '
    INT i,j
    '................................
    'print str(len(code))
   '.................................
    i=1
    WHILE i <= len(code)
        
        IF instr(ALPHABETS, mid(code,i,1)) <> 0            'isAlpha
            while i <= len(code) and INSTR(ALPHABETS ,mid(code,i,1)) > 0
                token = token + mid(code, i, 1)            
                 i=i+1
            wend  
           ' PRINT str i
           'print token        
            if  ucase(token)= isKeyword(token)   ' search keyword list
                tokens = tokens + token + " : KEYWORD" + crlf
                 token=""
            else
                tokens = tokens + token + " : IDENTIFIER" + crlf  'variabe
                  token=""
            end if
           'token=""
           'i=i+1
         END IF  

        IF i <= len(code) and instr(SYMBOLS, mid(code, i, 1)) > 0  'sym operators
             token = mid(code, i, 1)
              'print token
            tokens = tokens + token + " : SYMBOL" + crlf
            i=i+1
           token=""
        END IF
 
        IF instr(NUMBERS, mid(code, i, 1)) <> 0    'numbers
            while i <= len(code) and INSTR(NUMBERS_WITH_DECIMALPOINT,mid(code,i,1)) <> 0
                token = token + mid(code,i,1)
                 i=i+1
             wend
            tokens = tokens + token + " : NUMBER" + crlf
            token=""
        END IF

        'elseif ch = chr(34) 'quote "
            'token = ""
            'i = i + 1
            'j=1
            'while i+j < len(code) and mid(code,i+1,1) <> chr(34) 'string literal""
                'token = token + mid(code,i,1)
                'j=j+1
                'tokens = tokens + token + " :STRING-LITERAL" + crlf
            'i = i + j + 1
            'wend

        if i <= len(code) and mid(code, i, 1) = " "  'whitespace
             'token=""
            i=i+1
        end if

       

        'elseif ch = chr(10)
            'tokens = tokens + ch + " :NEWLINE" + crlf
            'i=i+1
        'else
            'tokens = tokens + ch + " :UNINDENTIFIED - ERROR!" + crlf
            'i=i+1

       
    
      ' i=i+1 ' increase main iterator
    WEND

    Return tokens

end function
'...........................................................
function isKeyword(byval tok as string) as string
'string ret
for n = 1 to 14
   if ucase(tok) = KEYWORDS[n]     ' if is KEYWORD
        RETURN KEYWORDS[n]  
   end if
next n
Return ""
end function
'.............................................................
'test tokenizer
string tokenList
string input = "For n= 10 To 100 :a= a*0.35 : Next n "
'call tokenizer
tokenList = tokenizer(input)
print tokenList


Attached Files Thumbnail(s)

Reply
Hey Aurel, it's coming along nicely.  As a suggestion you may want to put "-" in numbers somewhere for negative values.
Reply
Hi Phred
yes ...one thing is that is simplier than my own
with asci()  functions but which way is faster I  am not sure
anyway tokenizer doesn't need to be fast,
Reply
Aurel,

After you finish building your Tokenizer, are you going
to use it for the engine of a parser?

After you finish building a parser, are you going
to use it to build an interpreter?

After you finish building an interpreter, are you going
to use it to build a compiler?

After you finish building a ...
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply
(04-16-2018, 03:09 PM)eoredson Wrote: Aurel,

After you finish building your Tokenizer, are you going
to use it for the engine of a parser?

After you finish building a parser, are you going
to use it to build an interpreter?

After you finish building an interpreter, are you going
to use it to build a compiler?

After you finish building a ...

AGAIN!
B += x
Reply
YES AGAIN

Erik..erik...eeerik

After you finish building your Tokenizer, are you going
to use it for the engine of a parser?


wrong ..tokenizer is not engine for parser

After you finish building a parser, are you going
to use it to build an interpreter?


wrong ...parser and interpreter are two different things

After you finish building an interpreter, are you going
to use it to build a compiler?


no
Reply
Look Erik & others
I have time and i don't want to make old mistakes
which i do with my two interpreters - Aurel Basic and Ruben Interpreter.
And yes next step probably would be making a proper parser to build AST.
and TreeWalker runtime
for my new interpreter .
What would be that ...bytecode or something similar
Reply
In addition
I think that i need to explain some points...
why i do it slowly ... first
there are few options how to do some things and i am in doubt what is
best for me or easier to me.
By the way Ed explained ..do programs in Basic is a hard work Wink

also he stated:

Tokenized interpreter.  Tokenizes the entire input before interpreting, and never has to call the scanner again.  If done correctly, can be 9 times faster than the Pure interpreter


Is above right reason?
I think yes and i believe him.
If done properly ...i mean in first place - tokenizer
So tokenizer is one of very impotant part in this .
maybe some of you think that i talk to much but hey everyone have his way how to do stuff
right?
Reply
Also here is one parser + evaluator which can show parse Tree
written in Liberty Basic - yes ...i run program in LBB
and work properly.
this one use .-  shunting yard algorythm
and is good only for math expressions not enough for solid interpreter with commands.
but as example of parser is quite good.

Code:
'[RC] Arithmetic evaluation.bas
'Buld the tree (with linked nodes, in array 'cause LB has no pointers)
'applying shunting yard algorythm.
'Then evaluate tree

global stack$   'operator/brakets stack
stack$=""

maxStack = 100
dim stack(maxStack) 'nodes stack
global SP 'stack pointer
SP = 0

'-------------------
global maxNode,curFree
global FirstOp,SecondOp,isNumber,NodeCont
global opList$
opList$ = "+-*/^"

maxNode=100
FirstOp=1   'pointers to other nodes; 0 means no pointer
SecondOp=2
isNumber=3  'like, 1 is number, 0 is operator
NodeCont=4  'number if isNumber; or mid$("+-*/^", i, 1) for 1..5 operator

dim node(NodeCont, maxNode)
'will be used from 1, 0 plays null pointer (no link)

curFree=1   'first free node
'-------------------

in$ = " 1 + 2 ^ 3 * 4 - 12 / 6 "
print "Input: "
print in$

'read tokens
token$ = "#"
while 1
    i=i+1
    token$ = word$(in$, i)
    if token$ = "" then i=i-1: exit while

    select case
    case token$ = "("
        'If the token is a left parenthesis, then push it onto the stack.
        call stack.push token$

    case token$ = ")"
        'If the token is a right parenthesis:
        'Until the token at the top of the stack is a left parenthesis, pop operators off the stack onto the output queue.
        'Pop the left parenthesis from the stack, but not onto the output queue.
        'If the stack runs out without finding a left parenthesis, then there are mismatched parentheses.
        while stack.peek$() <> "("
            'if stack is empty
            if stack$="" then print "Error: no matching '(' for token ";i: end
            'add operator node to tree
            child2=node.pop()
            child1=node.pop()
            call node.push addOpNode(child1,child2,stack.pop$())
        wend
        discard$=stack.pop$()   'discard "("

    case isOperator(token$)
        'If the token is an operator, o1, then:
        'while there is an operator token, o2, at the top of the stack, and
        'either o1 is left-associative and its precedence is equal to that of o2,
        'or o1 has precedence less than that of o2,
        '   pop o2 off the stack, onto the output queue;
        'push o1 onto the stack
        op1$=token$
        while(isOperator(stack.peek$()))
            op2$=stack.peek$()
            if (op2$<>"^" and precedence(op1$) = precedence(op2$)) _
                OR (precedence(op1$) < precedence(op2$)) then
                '"^" is the only right-associative operator
                'add operator node to tree
                child2=node.pop()
                child1=node.pop()
                call node.push addOpNode(child1,child2,stack.pop$())
            else
                exit while
            end if
        wend
        call stack.push op1$

    case else   'number
    'actually, wrohg operator could end up here, like say %
        'If the token is a number, then
        'add leaf node to tree (number)
        call node.push addNumNode(val(token$))
    end select

wend

'When there are no more tokens to read:
'While there are still operator tokens in the stack:
'   If the operator token on the top of the stack is a parenthesis, then there are mismatched parentheses.
'   Pop the operator onto the output queue.
while stack$<>""
    if stack.peek$() = "(" then print "no matching ')'": end
    'add operator node to tree
    child2=node.pop()
    child1=node.pop()
    call node.push addOpNode(child1,child2,stack.pop$())
wend

root = node.pop()
'call dumpNodes
print "Tree:"
call drawTree root, 1, 0, 3
locate 1, 10
print "Result: ";evaluate(root)

end

'------------------------------------------
function isOperator(op$)
    isOperator = instr(opList$, op$)<>0 AND len(op$)=1
end function

function precedence(op$)
    if isOperator(op$) then
        precedence = 1 _
            + (instr("+-*/^", op$)<>0) _
            + (instr("*/^", op$)<>0) _
            + (instr("^", op$)<>0)
    end if
end function

'------------------------------------------
sub stack.push s$
    stack$=s$+"|"+stack$
end sub

function stack.pop$()
    'it does return empty on empty stack or queue
    stack.pop$=word$(stack$,1,"|")
    stack$=mid$(stack$,instr(stack$,"|")+1)
end function

function stack.peek$()
    'it does return empty on empty stack or queue
    stack.peek$=word$(stack$,1,"|")
end function

'---------------------------------------
sub node.push s
    stack(SP)=s
    SP=SP+1
end sub

function node.pop()
    'it does return -999999 on empty stack
    if SP<1 then pop=-999999: exit function
    SP=SP-1
    node.pop=stack(SP)
end function

'=======================================
sub dumpNodes
    for i = 1 to curFree-1
        print i,
        for j = 1 to 4
            print node(j, i),
        next
        print
    next
    print
end sub

function evaluate(node)
    if node=0 then exit function
    if node(isNumber, node) then
        evaluate = node(NodeCont, node)
        exit function
    end if
    'else operator
    op1 = evaluate(node(FirstOp, node))
    op2 = evaluate(node(SecondOp, node))
    select case node(NodeCont, node)    'opList$, "+-*/^"
    case 1
        evaluate = op1+op2
    case 2
        evaluate = op1-op2
    case 3
        evaluate = op1*op2
    case 4
        evaluate = op1/op2
    case 5
        evaluate = op1^op2
    end select
end function

sub drawTree node, level, leftRight, offsetY
    if node=0 then exit sub
    call drawTree node(FirstOp, node), level+1, leftRight-1/2^level, offsetY

    'print node
    'count on 80 char maiwin
    x = 40*(1+leftRight)
    y = level+offsetY
    locate x, y
    'print  x, y,">";
    if node(isNumber, node) then
        print node(NodeCont, node)
    else
        print  mid$(opList$, node(NodeCont, node),1)
    end if

    call drawTree node(SecondOp, node), level+1, leftRight+1/2^level, offsetY
end sub

function addNumNode(num)
'returns new node
    newNode=curFree
    curFree=curFree+1
    node(isNumber,newNode)=1
    node(NodeCont,newNode)=num

    addNumNode = newNode
end function

function addOpNode(firstChild, secondChild, op$)
'returns new node
'FirstOrSecond ignored if parent is 0
    newNode=curFree
    curFree=curFree+1
    node(isNumber,newNode)=0
    node(NodeCont,newNode)=instr(opList$, op$)

    node(FirstOp,newNode)=firstChild
    node(SecondOp,newNode)=secondChild

    addOpNode = newNode
end function
Reply
@aurel: Your subject header:

Quote:Aurel tells All: Interpreters

Could you tell us more about interpreters than we already know?

Everytime I ask about your Tokenizer becoming a Parser you shoot me down.

What is it that you are working on?
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply
Quote:YES AGAIN


Erik..erik...eeerik

After you finish building your Tokenizer, are you going
to use it for the engine of a parser?


wrong ..tokenizer is not engine for parser

After you finish building a parser, are you going
to use it to build an interpreter?


wrong ...parser and interpreter are two different things

After you finish building an interpreter, are you going
to use it to build a compiler?


no

It looks like you are not building anything...
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply
Quote:It looks like you are not building anything...
how you know that ?
maybe you see what i have on my computer ?Rolleyes
if something is not presented here does'nt mean that is not finished
Reply
oups
sorry erik i forget to read your first reply ..

I am working on TOKENIZER and i can say that is tested and work ok.
If you think that this is a trivial thing ,,,well yes if you whish something small
like just math evaluator but for interpreter need time .
Next thing (step) is PARSER
This culd be RECUSIVE DESCENT or PRECEDENCE CLIMBING
I started with Precedence Climbing method.
Well im still not sure to nuild AST or not or build
some intermediate representation.
IR also require Virtual Machine or TreeWalker .
I have lot od code (examples & ideas ) how to do that
and i tryed few already,
Reply
Hey Erik
You are member of my forum ..right?
So you can chek what i post there .
Feel free and ask what you wish..
no problem at all
Reply
Quote:Everytime I ask about your Tokenizer becoming a Parser you shoot me down.
Come on man...
tokenizer is tokenizer
and parser is parser
so i don't understand how you don't see or don't understand difference.
Or my english is totally broken ..maybe
my bad... Shocked

time for [$]>
Reply
Not to be annoying or incorrect but I have been following your progress on some really cool projects,

Quote:Next thing (step) is PARSER
This could be RECUSIVE DESCENT or PRECEDENCE CLIMBING
I keep getting tokenizers confused with parsers confused with interpreters and I'm wondering
which one you are working on.

Erik.
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply
Hey Erik
I am confused too with so many different ways how to do same things
most people use recursive descent parser for interpreter.
And tokenizer must know how to split code to tokens
and can also build variable storage because tokenizer can extract identifier
to variable list( read array). etc ..etc...
Reply
I'm trying to follow your parsing progress and went to your aurelsoft site at:

http://aurelsoft.ucoz.com/

and got a huge advertisement over the top-right of your page blocking what may be
the home button or something else I couldn't see..

Erik.
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply