Aurel tells All: Interpreters
#81
EdDavis, Glad you are back!
B += x
Reply
#82
Ok, now I am confused..

Do you want an interpreter, an evaluator, a parser, a tokenizer, an expression parser, or a
recursive descent parser? I have 2 of the above: What64.zip is a recursive descent parser,
SICK is an interpreter.

Technically, I do not have a tokenizer. Sorry.

btw: I looked at my Whatis parser and it could not be edited to store token lists because
it only parses one line. And I looked at my SICK and it could not either because it is
strictly an interpreter and can't store token lists per program.
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply
#83
Tokenizing:

http://php.net/manual/en/book.tokenizer.php

Erik.
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply
#84
How does Tokenizer fit into the overall process of interpreting a program?

There must be speed advantage for doing all this (in my eyes) extra work of labeling everything and tracking line and column (though that I can see handy for error messages).

Is a program pre-scanned and processed like Phred did to map where all the jump places were recorded? Now that I can see saving time ( I think haven't tried yet).

Since Tokenizer discussed with evaluation code also, for Interpreter a tokenizer is just bigger and fancier than how it is used for evaluation code (changing string expressions to values), it seems to me.

If these assumptions are (almost, does not have to be perfectly) correct, can someone outline process how tokenized strings help in string expression evaluations? Just a gist or brief summary from perspective of over all processing?

If an assumption is completely wrong? How can that be!?!? Wink


Also what I haven't tried yet is binary lookup of variable values from names, how many variables does it take before binary lookup starts to pay for itself for finding values faster than brute force scan? Has to pay for hassle of keeping tables sorted with every new variable created. I would think hundreds maybe thousands of variables???

BTW my goal is building little hobby interpreters for playing around with not big giant things for corporations that have to be absolutely bullet proof traveling at speed of light. I prefer to leave such boring work to the professionals.
B += x
Reply
#85
Mark
you trow out lot of question which create confusion.
Tokenizer as tokenizer is nothing if you don't create proper parser
which transform tokenized code into Abstract Syntax Tree - AST
which then be later executed by evaluator - VM or TreeWalker
do you get it now.
Very good and simple example i have found on code Project.
Ed present good thing (as usual) and he have a knowlege and experience > Toy Bytecode Interpreter
Reply
#86
I suspect some of the confusion is due to different words being used for the same thing.  I could be wrong, but I'm under the impression that a tokenizer and a lexer are pretty much the same thing, they take a string of text and break it into usable chunks (tokens), which are then fed to the parser which can use different methods, such as the Abstract Syntax Tree, to process those chunks into actions.  Depending on the output of the parser you could end up with BASIC code, which is what mine does and then it processes the instructions immediately instead translating the whole thing before it starts, or in the case of Jack Crenshaw's tutorial, outputs asm code which is all translated before being fed to the assembler which produces the final program.  I don't if this helps or just muddies the water even more but you sort of have to agree on a common terminology.
Reply
#87
Ha! yes I throw questions out because I am thrown by such terms as "Abstract Syntax Tree".

Yeah, maybe AST does explain everything if one knows what the heck that is! ;-)

That's OK, to be in dark a little, enough people describe their perspective of things and I can pickup gist of what is going on, usually... LOL

So can AST be described without throwing out three more acronyms I've never heard of? Big Grin

I swear all these random letters go in one eye and out the other, having nothing in between to slow them down.  Shocked
B += x
Reply
#88
Mark
do you know what is tree view control in windows programming?
something like that
let say like this :
expression -> 3+4
operator [+] -- left(3) , right(4)

but if you have :
3+4*2
you must have some sorting sequence which sort your expression by operator
precedence ,so first multiply then addition...right?

if op[x], prec=3 -> getLeft(4), getRight(2)   -> stackValue(8)
if op[+],prec=2 -> getLeft(3), getStackValue(8)
Reply
#89
Quote:How does Tokenizer fit into the overall process of interpreting a program?

Phred had a great reply.  What he says is correct.

Scanner/Lexical analyzer/tokenizer are all exactly the same thing.

In a traditional compiler/interpreter:

Scanner: Analyze the code breaking it up into consistent pieces.

Parser: Take the output of the scanner, and make sure the syntax is correct.  During this phase, one of several things can happen:

* Generate code (machine code, assembly, virtual machine code)

* Or, generate an intermediate representation.  An Abstract Syntax Tree (AST), or quadruples, or triples.

* Or, as you are parsing, interpret the code.

Semantic Analyser (not needed for simple languages)

This phase checks to make sure the meaning (thus, semantics) of the language is obeyed - generally, it traverses the AST to make sure you are not trying to multiple or divide an integer by a string, for instance.

Code generator:  Taking the intermediate form produced above, generate code.

Quote:There must be speed advantage for doing all this (in my eyes) extra work of labeling
everything and tracking line and column (though that I can see handy for error messages).

If these assumptions are (almost, does not have to be perfectly) correct, can someone
outline process how tokenized strings help in string expression evaluations? Just a gist
or brief summary from perspective of over all processing?

I think the below answers your questions - if not, just ask again, but a little differently :-)

Notes:

What does a scanner by you?

* it breaks your code into smaller self-contained pieces, which is always a good idea.

* in general, using a scanner to convert the code to integers (e.g., ident, gte, lss, equal) makes the resulting parser faster, as it is faster to compare integers than compare strings.  But for small programs, and todays computers, this doesn't extra speed isn't noticeable.

You don't really need to create a separate scanner.  For a good, small example, see:
https://github.com/tgibson37/tiny-c

Main program (tc.c) is only about 1000 lines of code, pretty easy to follow.

He uses a "scannerless parser".  For a hobby interpreter, there is no real need to do otherwise.

For a simple hobby interpreter, a Scanner isn't strictly necessary.

Quote:Also what I haven't tried yet is binary lookup of variable values from names, how many
variables does it take before binary lookup starts to pay for itself for finding values
faster than brute force scan? Has to pay for hassle of keeping tables sorted with every
new variable created. I would think hundreds maybe thousands of variables???

According to many compiler and algorithm and data structure text books, hash tables are the way to go, as they can be a good bit faster than binary trees.

The conventional wisdom is 7-21 variables, depending on the processor.

Quote:So can AST be described without throwing out three more acronyms I've never heard of?

And AST is just a conversion of the input source code into a form that is more easily used by a program, for further analysis.

So, instead of a = 5 + 2 * 3

An AST might look like (expressed as a binary tree):

Code:
type:assign
left: variable (a)
right:operator (+)

type: operator (+)
left: number (5)
right: operator (*)

type: operator (*)
left: number (2)
right number (3)

This is usually formulated as N-way tree, but you can also use a binary tree.  And also, binary tress can be simulated using arrays.

Again, AST's are never needed.  It is mainly useful if you want to do further analysis on the code.  If you are generating lower level code (byte code or real machine code), an AST can make that easier to do.

Many early Pascal and C compilers just tokenized the input, passed that to the parser, and the parser did the rest, without an AST.

So, you can easily create a interpreter, that does not use a traditional scanner, and that does not create an AST.

Only create a separate scanner if you really need one, and only create an AST if you really need one.

Your earlier interpreter looked like it was doing just fine without either scanner or AST.
Reply
#90
Ed thanks, very helpful, specially that last sentence! Smile
B += x
Reply
#91
-
Ed
if my memory serves me
You tell me that crating AST is good thing and that you know that from experience
Now you talking different things.
Look i don't want to argue with you or with anyone else here
and every contribution is welcome
As far from what i read building AST is must if we want to use TreeWalker
instead of VirtualMachine like you build in toy.

If others members are interested then we can continue talk about thematic
but if NOT...
then I will leave this topic?
Reply
#92
Quote:if my memory serves me
You tell me that crating AST is good thing and that you know that from experience
Now you talking different things.
Look i don't want to argue with you or with anyone else here
and every contribution is welcome
As far from what i read building AST is must if we want to use TreeWalker
instead of VirtualMachine like you build in toy.

I think you did not understand what I said:

Quote:Again, AST's are never needed.  It is mainly useful if you want to do further analysis on the code.  If you are generating lower level code (byte code or real machine code), an AST can make that easier to do.

Many early Pascal and C compilers just tokenized the input, passed that to the parser, and the parser did the rest, without an AST.

So, you can easily create a interpreter, that does not use a traditional scanner, and that does not create an AST.

Only create a separate scanner if you really need one, and only create an AST if you really need one.

Your earlier interpreter looked like it was doing just fine without either scanner or AST.

In summary, AST's are not a necessity, especially for simple scripting languages.  However, they are very useful, and can actually make the development of the compiler easier (in my opinion).  In the Interpreter I am developing, I am using an AST.  For early versions, I'm interpreting the AST.  Later on, I'll generate Byte code from the AST.  While you don't need an AST to generate Byte code, for me at least, it makes it much easier to have one.

Especially if you are translating statements like a "for" statement, an AST can make that much easier, than without one.

Also, with respect to speed of the interpreter, an AST interpreter can be much faster than a "pure" interpreters.  Based on my readings and testing, I've determined the following:

From slowest to fastest:
  • Pure interpreter.  Has a scanner, but does not save the results for reuse.
  • Tokenized interpreter.  Tokenizes the entire input before interpreting, and never has to call the scanner again.  If done correctly, can be 9 times faster than the Pure interpreter.
  • AST interpreter.  If done correctly, can be 10 times faster than the Tokenized interpreter.
  • Byte code interpreter.  If done correctly, can be 3 times faster than the AST interpreter.
  • Generate assembly or machine code.  If good code is generated, can be 8 times faster than the Byte code interpreter.


So yes, I agree with you.  An AST is a good thing.  I think you should use them in your interpreters.  I did not really understand them for many years, and did not use them.  In the only commercial compiler project that I was involved with, we did not use them, simply because I didn't understand them.  But I wish I had - we could have developed a much better compiler using them. But now that I understand them, I use them to my advantage.  But an AST is not a necessity.

See the Rosetta code entry for compilers.  It has a simple Scanner, Parser (generates AST), AST interpreter, Code generator, and Byte code interpreter. That is a task I came up with, and I wrote the specifications and the C and Python examples.  I also wrote the FreeBasic Scanner, but ran out of gas using Basic for the other parts.  I only use Basic so I can converse with y'all.  I much prefer C and Python.  Coding in them is fun. Coding in Basic is hard work!

Finally, just to be sure:  I am _not_ a compiler expert.  I like them, I have an extensive library of compiler books, I've written many (simple) ones, was the head of a team that developed a simple commercial one (macro language for an application), and I greatly enjoy discussing them - that is why I'm here.  But I'm still learning and experimenting just like most of the rest of us.

I read the comp.compilers newsgroup, and realize that those guys are in an entirely different league than I am - they're in the major leagues, and I'm just in some amateur bush league in some small town.
Reply
#93
Quote:From slowest to fastest:
  • Pure interpreter.  Has a scanner, but does not save the results for reuse.


  • Tokenized interpreter.  Tokenizes the entire input before interpreting, and never has to call the scanner again.  If done correctly, can be 9 times faster than the Pure interpreter.


  • AST interpreter.  If done correctly, can be 10 times faster than the Tokenized interpreter.


  • Byte code interpreter.  If done correctly, can be 3 times faster than the AST interpreter.


  • Generate assembly or machine code.  If good code is generated, can be 8 times faster than the Byte code interpreter.


This is the overview gist I have been needing to see to put things into perspective! Thank you!

9 X 10 X 3 X 8 = 2160    worth looking into for graphics I think.

Now this tokenizer business, is this what they are talking about when they say Just Basic or QBasic (or is it Quick Basic) is tokenized...

Now what the heck is a treewalker?

Pete will say, something less moving than a streetwalker?   well I guess I said it for him. 
Pete, manBot at your service. Big GrinBig Grin
B += x
Reply
#94
Quote:Now this tokenizer business, is this what they are talking about when they say Just Basic or QBasic (or is it Quick Basic) is tokenized...

Yes, I believe so.  Repeatedly tokenizing (or scanning or Lexical analysis) the input can be time consuming.  By first tokenizing the whole file, and then referring back to those tokens, you can make a much faster interpreter.
And, while most people think it is easier to not even bother with a scanner/tokenizer/lexical analyzer, it turns out to be just the opposite.  Once you've used a separate scanner, it makes other coding so much easier.  Maybe for a simple calculator - but once you add keywords, strings, a scanner makes the whole thing much simpler to work with. Well, it did for me at least.

re: Tree Walking

The Wikipedia article is pretty good: https://en.wikipedia.org/wiki/Tree_traversal

See the Rosetta Code entry for Arithmetic evaluation for several Calculators utilizing AST's written in several Basic dialects.  I much prefer the FreeBasic one (since I wrote it :-) ).

Evaluating the AST is as simple as below:

Code:
function eval(byval t as Tree ptr) as double
    if t <> 0 then
        select case t->op
            case minus_sym:       return eval(t->leftp) - eval(t->rightp)
            case plus_sym:        return eval(t->leftp) + eval(t->rightp)
            case mul_sym:         return eval(t->leftp) * eval(t->rightp)
            case div_sym:         return eval(t->leftp) / eval(t->rightp)
            case unary_minus_sym: return -eval(t->leftp)
            case unary_plus_sym:  return  eval(t->leftp)
            case number_sym:      return t->value
            case else:            error_msg("unexpected tree node")
        end select
    end if
    return 0
end function

Building it isn't much harder:

Code:
function primary as Tree ptr
    dim t as Tree ptr = 0

    select case sym
        case minus_sym, plus_sym
            dim op as Symbol = sym
            getsym()
            t = expr(prec(unary_minus_sym))
            if op = minus_sym then return make_node(unary_minus_sym, t, 0)
            if op = plus_sym  then return make_node(unary_plus_sym,  t, 0)
        case lparen_sym
            getsym()
            t = expr(0)
            if sym <> rparen_sym then error_msg("expecting rparen")
            getsym()
            return t
        case number_sym
            t = make_node(sym, 0, 0)
            t->value = tokenval
            getsym()
            return t
        case else: error_msg("expecting a primary")
    end select
end function

function expr(byval p as integer) as Tree ptr
    dim t as Tree ptr = primary()

    while is_binary(sym) andalso prec(sym) >= p
        dim t1 as Tree ptr
        dim op as Symbol = sym
        getsym()
        t1 = expr(prec(op) + 1)
        t = make_node(op, t, t1)
    wend
    return t
end function[\code]
Reply
#95
Quote:Tokenized interpreter.  Tokenizes the entire input before interpreting, and never has to call the scanner again.  If done correctly, can be 9 times faster than the Pure interpreter.
Yes Ed i agree with you about that.
As i said before i will try above quoted aproach first
because looks easiest to me and execution speed should be enough.
Reply
#96
In the Interpreter I am developing, I am using an AST.

-Is that mean that you work on something new or you speak in general?

I only use Basic so I can converse with y'all.  I much prefer C and Python.  Coding in them is fun. Coding in Basic is hard work!

- Yeah this is very visible from your coding style
u use lot of undescores _ to name varaibles like next_char etc...
Reply
#97
Quote:9 X 10 X 3 X 8 = 2160    worth looking into for graphics I think.

My numbers were off (sorry about that):
  • Pure:
  • Tokenized:          8.4     (9)
  • AST:                6.2     (10)
  • Byte code:          2.75    (3)
  • native:             8       (8)





So 8.4 * 6.2 * 2.75 * 8 =  1145

To give a more concrete perspective, some actual timings - rounded off to the nearest second:
  • Pure: 1154 seconds
  • Tokenized: 137 seconds
  • AST: 22 seconds
  • Byte code: 8 seconds    
  • native: 1 second





This was a heavily integer/math/loop/if statement intensive benchmark, the sort of thing one uses for graphics (except real numbers are used instead of integers).

This was also one specific test, and with a highly optimized AST interpreter, and a highly optimized Byte code interpreter.  And using GNU C with -O3 for the native code.  Both the Pure and Tokenized interpreter could have possibly been sped up somehow, but I didn't spend much time on optimizing those, and I'm not much interested in such beasts, except for "show and tell" :-)  At least for me, the pure and tokenized interpreters are harder to code.  Maybe not for calculators, but once you have functions with parameters, things can get messy quickly.
Reply
#98
Quote:Is that mean that you work on something new or you speak in general?

Something new.  I have a Byte code interpreter than was first developed in 1991.  It has only integers and strings.  It has a tokenizer, parser, and generates byte code straight from the parser.  I want to add lists (flexible arrays, like Python) and associative arrays (like Javascript/Python) and real numbers.  First step is for the parser to generate an AST.  Then I'll write an interpreter for that.  Once that is finished and tested, I'll work on generating byte code from the AST.

Quote:- Yeah this is very visible from your coding style
u use lot of undescores _ to name varaibles like next_char etc..

Not to start a war, but which is more readable:

NotToStartAWarButWhichDoestThouPercieveToBeMoreLegible

not_to_start_a_war_but_which_doest_thou_percieve_to_be_more_legible

So that is why I use underscores.
Reply
#99
I want to add lists (flexible arrays, like Python) and associative arrays (like Javascript/Python) and real numbers.

Ed...of course without war...
Well nice to know...flexibile arrays & associative arrays ..good all that you can try in
Oxygen basic too. but i think that python list are more like Linked list.
Look i really don't have nothing against C but python..i relly hate that language.
You know that python is slower than your toy-bytecode interpreter?
In general python sucks on windows.

By the way you already have real numbers ( i mean floats) in your toy-bytecode
right?

Also when you say that you are not expert ...he ..he ok if you like it.
You are better than so many called experts including 'you know who-ML'
Reply
Over few months i have collected large number of different sources of interpreters.
And here is one of them . - lexer aka tokenizer

Devin Barron: Coded PrintLines function
Jonathan Brownlee: wrote a large part of PrintTokens, ran test cases on it.
Cody Boyer: Created original skeleton and error handling for most cases, skeleton modified by others later.
Christian Acosta: Added code for error handling and assisted with testing of program.

Code:
#INCLUDE <stdio.h>
#INCLUDE <string.h>
#INCLUDE <ctype.h>
#INCLUDE <stdlib.h>

void PrintLines(int commentsIncluded, FILE *ifp);
void PrintTokens(FILE *ifp);

typedef enum token {nulsym = 1, identsym = 2, numbersym = 3, plussym = 4, minussym = 5, multsym = 6, slashsym = 7, oddsym = 8,
    eqlsym = 9, neqsym = 10, lessym = 11, leqsym = 12,
    gtrsym = 13, geqsym = 14, lparentsym = 15, rparentsym = 16, commasym = 17, semicolonsym = 18, periodsym = 19,
    becomessym = 20, beginsym = 21, endsym = 22, ifsym = 23, thensym = 24, whilesym = 25, dosym = 26, callsym = 27,
    constsym = 28, varsym = 29, procsym = 30, writesym = 31, readsym = 32, elsesym = 33} token_type;

int main(int argc, char* argv[]){
    FILE *ifp = fopen(argv[1],"r");
    int i;
    
    //Checks For --source Or --clean command arguments For printing
    if(argv[2] != NULL){
        for(i = 2; i < argc; i++){
            if(strcmp(argv[i],"--source")==0)
            {
                // Print with comments
                PrintLines(1, ifp);
                rewind(ifp);
            }
            if(strcmp(argv[i],"--clean")==0)
            {
                // Print without comments
                PrintLines(0, ifp);
                rewind(ifp);
            }
        }
    }
    
    PrintTokens(ifp);
    // Close the file from reading
    fclose(ifp);
    
    Return 0;
}


// Prints output If --source Or --clean are given as command arguments
//If commentsIncluded = 1, print comments. If 0, don't.
void PrintLines(int commentsIncluded, FILE *ifp){
    // Declare variables
    char current;
    int halt;
    
    // Header For printing without comments
    if(commentsIncluded==0)
    {
        printf("\nsource code without comments:\n");
        printf("-----------------------------\n");
    }
    // Header For printing with comments
    Else
    {
        printf("\nsource code:\n");
        printf("------------\n");
    }
    
    // Scan new characters Until the End of the file
    while(fscanf(ifp, "%c", &current)!=EOF)
    {
        // Check For a potential comment
        if(current=='/' && commentsIncluded==0)
        {
            // Read another character
            fscanf(ifp, "%c", &current);
            // Confirm whether Or not a comment has been found
            if(current=='*')
            {
                // Replace initial two comment characters with two spaces
                printf("  ");
                // Set halt to 0 Until End of comment
                halt=0;
                // Find the End of the comment
                while(!halt)
                {
                    fscanf(ifp, "%c", &current);
                    // Replace comment characters with spaces
                    printf(" ");
                    // Check For potential End of comment
                    if(current=='*')
                    {
                        fscanf(ifp, "%c", &current);
                        // Replace comment characters with spaces
                        printf(" ");
                        // Check For End of comment
                        if(current=='/')
                        {
                            // End of comment found
                            halt=1;
                        }
                    }
                }
            }
            // Not a comment, print the two characters
            Else
            {
                printf("/");
                printf("%c", current);
            }
        }
        // Not a comment, print the character
        Else
        {
            printf("%c", current);
        }
    }
    // Line Break
    printf("\n");
}


// Reads in each token And prints the values
// Should consist of an obnoxiously large switch statement to handle the tokens
void PrintTokens(FILE *ifp)
{
    char string[13]; // stores variable names and reserved words for strcmp
    char current; // stores fgetc returned char
    int found = 0; // used for finding end of comment
    int no_scan = 0; // used to prevent skipping over non-whitespace ex. ">4"
    int counter = 0; // keeps track of array length
    int reserved = 0;
    int i = 0;
    int num;
    
<<<<<<< HEAD
=======
    
    // memset(string, '\0', 12);
    for(i = 0; i < 12; i++)
        string[i] = '\0';
>>>>>>> 632cad1a5dc69676303b2d67151f2202ddbc099d
    
    // memset(string, '\0', 12);
    for(i = 0; i < 12; i++)
        string[i] = '\0';

    while(!feof(ifp)){
<<<<<<< HEAD

    // prevents skipping over valid token If scanned For multiple character token during switch statement
    if(no_scan == 0)
        current = fgetc(ifp);
    // reinitialize no_scan to 0
    Else
        no_scan = 0;
    // reinitialize found to 0
    found = 0;

    // filters out white space from the rest of the If-statements
    if(!isspace(current))
    {
        // Check to see If the current character is a character other than a letter Or number
        if(!isalpha(current) && !isdigit(current))
        {
        switch(current)
            {
            // possible comment Case
            Case '/':
                current = fgetc(ifp);
    
                // If comment loops Until the End of the comment
                if(current == '*')
                {
                    while(found == 0)
                    {
                        current = fgetc(ifp);
                        // possible End of comment
                        if(current == '*')
                        {
                            current = fgetc(ifp);
                            // If "/", End of comment found
                            if(current == '/')
                            found = 1;
                         }
                     }
                }
    
                // Else print slashsym
                Else
                {
                    printf("/\t%d\n", slashsym);
                    no_scan = 1;
                }
                Break;
                
            // various single char operator And special symbol cases
            Case '*':
                printf("*\t%d\n", slashsym);
                Break;
            Case '+':
                printf("+\t%d\n", plussym);
                Break;
            Case '-':
                printf("-\t%d\n", minussym);
                Break;
            Case '(':
                printf("(\t%d\n", lparentsym);
                Break;
            Case ')':
                printf(")\t%d\n", rparentsym);
                Break;
            Case ',':
                printf(",\t%d\n", commasym);
                Break;
            Case '.':
                printf(".\t%d\n", periodsym);
                Break;
            Case ';':
                printf(";\t%d\n", semicolonsym);
                Break;
            Case '=':
                printf("=\t%d\n", eqlsym);
                Break;
            Case '<':
                // check Next char
                current = fgetc(ifp);
                if(current == '=')
                    printf("<=\t%d\n", leqsym);
                Else if(current == '>')
                    printf("<>\t%d\n", neqsym);
                Else
                {
                    printf("<\t%d\n", lessym);
                }
                Break;
            Case '>':
                if(current == '=')
                    printf(">=\t%d\n", geqsym);
                Else
                {
                    printf(">\t%d\n", gtrsym);
                    no_scan = 1;
                }    
                Break;

            // possible becomessym Case
            // error here If no '=' ?
            Case ':':
                // scan For Next char
                current = fgetc(ifp);
                // check For becomessym Case, print symbol And associated int If found
                if(current == '=')
                    printf(":=\t%d\n", becomessym);
                Else
                    no_scan = 1;
                Break;
            Default:                
                continue;
        }
        }

        // likely variable Or reserved word Case
        if(isalpha(current))
        {
            // scans For reserved words Or variables
            while(found == 0 && counter < 13)
            {
                string[counter] = current;
                counter++;
                current = fgetc(ifp);

                if(isdigit(current))
                    reserved = 1;

                if(isspace(current))
                    found = 1;

               Else if(!isalpha(current) && !isdigit(current))
                {
                    if(!isspace(current) && (current != ',') && (current != ';') && (current != '.') && (current != ')') && (current != '(') && (current != '+') && (current != '-') && (current != '*') && (current != '/') && (current != '=') && (current != ':') && (current != '>') && (current != '<')){
                        printf("\nERR: Unidentified Token %s",string);
                        Return;
                    }
                    
                    found = 1;
                    no_scan = 1;
                }
            }
            
            // print reserved words Or variable here
            if(reserved == 1 || counter < 2 || (counter > 9 && counter < 12))
                printf("%s\t%d\n", string, identsym);
            // test string For reserved words
            Else if(reserved == 0)
            {
                if(counter < 3)
                {
                    // test For If, do
                    if(strcmp(string,"if") == 0)
                        printf("if\t%d", ifsym);
                        
                    Else if(strcmp(string,"do") == 0)
                        printf("do\t%d", dosym);
                        
                    Else
                        printf("%s\t%d\n", string, identsym);
                }
                Else if(counter < 4)
                {
                    // test For var, End, call, odd
                    if(strcmp(string, "var") == 0)
                        printf("var\t%d\n", varsym);
                        
                    Else if(strcmp(string, "end") == 0)
                        printf("end\t%d\n", endsym);
                        
                    Else
                        printf("%s\t%d\n", string, identsym);
                }
                Else if(counter < 5)
                {
                    // test For then, Else, read, call
                    if(strcmp(string, "then") == 0)
                        printf("then\t%d\n", thensym);
                        
                    Else if(strcmp(string, "else") == 0)
                        printf("else\t%d\n", elsesym);
                        
                    Else if(strcmp(string, "read") == 0)
                        printf("read\t%d\n", readsym);
                        
                    Else if(strcmp(string, "call") == 0)
                        printf("call\t%d\n", callsym);
                        
                    Else
                        printf("%s\t%d\n", string, identsym);
                }
                Else if(counter < 6)
                {
                // test For const begin While write
                    if(strcmp(string, "const") == 0)
                        printf("const\t%d\n", constsym);
                        
                    Else if(strcmp(string, "begin") == 0)
                        printf("begin\t%d\n", beginsym);
                        
                    Else if(strcmp(string, "while") == 0)
                        printf("while\t%d\n", whilesym);
                        
                    Else if(strcmp(string, "write") == 0)
                        printf("write\t%d\n", writesym);
                        
=======
        
        // prevents skipping over valid token If scanned For multiple character token during switch statement
        if(no_scan == 0)
            current = fgetc(ifp);
        // reinitialize no_scan to 0
        Else
            no_scan = 0;
        // reinitialize found to 0
        found = 0;
        
        // filters out white space from the rest of the If-statements
        if(!isspace(current))
        {
            // Check to see If the current character is a character other than a letter Or number
            if(!isalpha(current) && !isdigit(current))
            {
                switch(current)
                {
                        // possible comment Case
                    Case '/':
                        current = fgetc(ifp);
                        
                        // If comment loops Until the End of the comment
                        if(current == '*')
                        {
                            while(found == 0)
                            {
                                current = fgetc(ifp);
                                // possible End of comment
                                if(current == '*')
                                {
                                    current = fgetc(ifp);
                                    // If "/", End of comment found
                                    if(current == '/')
                                        found = 1;
                                }
                            }
                        }
                        
                        // Else print slashsym
                        Else
                        {
                            printf("/\t%d\n", slashsym);
                            no_scan = 1;
                        }
                        Break;
                        
                        // various single char operator And special symbol cases
                    Case '*':
                        printf("*\t%d\n", slashsym);
                        Break;
                    Case '+':
                        printf("+\t%d\n", plussym);
                        Break;
                    Case '-':
                        printf("-\t%d\n", minussym);
                        Break;
                    Case '(':
                        printf("(\t%d\n", lparentsym);
                        Break;
                    Case ')':
                        printf(")\t%d\n", rparentsym);
                        Break;
                    Case ',':
                        printf(",\t%d\n", commasym);
                        Break;
                    Case '.':
                        printf(".\t%d\n", periodsym);
                        Break;
                    Case ';':
                        printf(";\t%d\n", semicolonsym);
                        Break;
                    Case '=':
                        printf("=\t%d\n", eqlsym);
                        Break;
                    Case '<':
                        // check Next char
                        current = fgetc(ifp);
                        if(current == '=')
                            printf("<=\t%d\n", leqsym);
                        Else if(current == '>')
                            printf("<>\t%d\n", neqsym);
                        Else
                        {
                            printf("<\t%d\n", lessym);
                        }
                        Break;
                    Case '>':
                        if(current == '=')
                            printf(">=\t%d\n", geqsym);
                        Else
                        {
                            printf(">\t%d\n", gtrsym);
                            no_scan = 1;
                        }
                        Break;
                        
                        // possible becomessym Case
                        // error here If no '=' ?
                    Case ':':
                        // scan For Next char
                        current = fgetc(ifp);
                        // check For becomessym Case, print symbol And associated int If found
                        if(current == '=')
                            printf(":=\t%d\n", becomessym);
                        Else
                            no_scan = 1;
                        Break;
                    Default:
                        printf("\nERR: Unidentified token detected\n");
                        Return;
                }
            }
            
            // likely variable Or reserved word Case
            if(isalpha(current))
            {
                // scans For reserved words Or variables
                while(found == 0 && counter < 13)
                {
                    string[counter] = current;
                    counter++;
                    current = fgetc(ifp);
                    
                    if(isdigit(current))
                        reserved = 1;
                    
                    if(isspace(current))
                        found = 1;
                    
                    Else if(!isalpha(current) && !isdigit(current))
                    {
                        // Weren't these just for end-of-comment checking?
                        found = 1;
                        no_scan = 1;
                        
                        // If a symbol is in an alphanumeric string, then it is an invalid token
                        printf("\nERR: Unidentified Token %s",string);
                        Return;
                    }
                }
                
                // print reserved words Or variable here
                if(reserved == 1 || counter < 2 || counter > 9)
                    printf("%s\t%d\n", string, identsym);
                // test string For reserved words
                Else if(reserved == 0)
                {
                    if(counter < 3)
                    {
                        // test For If, do
                        if(strcmp(string,"if") == 0)
                            printf("if\t%d", ifsym);
                        
                        Else if(strcmp(string,"do") == 0)
                            printf("do\t%d", dosym);
                        
                        Else
                            printf("%s\t%d\n", string, identsym);
                    }
                    Else if(counter < 4)
                    {
                        // test For var, End, call, odd
                        if(strcmp(string, "var") == 0)
                            printf("var\t%d\n", varsym);
                        
                        Else if(strcmp(string, "end") == 0)
                            printf("end\t%d\n", endsym);
                        
                        Else
                            printf("%s\t%d\n", string, identsym);
                    }
                    Else if(counter < 5)
                    {
                        // test For then, Else, read, call
                        if(strcmp(string, "then") == 0)
                            printf("then\t%d\n", thensym);
                        
                        Else if(strcmp(string, "else") == 0)
                            printf("else\t%d\n", elsesym);
                        
                        Else if(strcmp(string, "read") == 0)
                            printf("read\t%d\n", readsym);
                        
                        Else if(strcmp(string, "call") == 0)
                            printf("call\t%d\n", callsym);
                        
                        Else
                            printf("%s\t%d\n", string, identsym);
                    }
                    Else if(counter < 6)
                    {
                        // test For const begin While write
                        if(strcmp(string, "const") == 0)
                            printf("const\t%d\n", constsym);
                        
                        Else if(strcmp(string, "begin") == 0)
                            printf("begin\t%d\n", beginsym);
                        
                        Else if(strcmp(string, "while") == 0)
                            printf("while\t%d\n", whilesym);
                        
                        Else if(strcmp(string, "write") == 0)
                            printf("write\t%d\n", writesym);
                        
                        Else
                            printf("%s\t%d\n", string, identsym);
                    }
                    
                    // test string For Procedure
                    Else if(strcmp(string, "procedure") == 0)
                        printf("procedure\t%d\n", procsym);
                    
                    // Else it's a reserved word (or should throw an error if too long)
>>>>>>> 632cad1a5dc69676303b2d67151f2202ddbc099d
                    Else
                        printf("%s\t%d\n", string, identsym);
                }
                
<<<<<<< HEAD
                // test string For Procedure
                Else if(strcmp(string, "procedure") == 0)
                    printf("procedure\t%d\n", procsym);
            
           //Throw error If identifier longer than 12 characters
        Else if(counter > 11)
        {
            printf("\nERR: Identifier longer than 12 characters detected\n");
            Return;
        }
                    
                // Else it's a reserved word
                Else
                    printf("%s\t%d\n", string, identsym);
            }
            
            Else
                printf("%s\t%d\n", string, identsym);

            // reinitialize string & counter
            for(i = 0; i < 12; i++)
                string[i] = '\0';
            counter = 0;
            reserved = 0;
        }
        
        // scans in And prints integers
        // should print error If alphabetical char comes right after an integer char
        if(isdigit(current))
        {
            while(found == 0 && counter < 6)
            {
                string[counter] = current;
                counter++;
                current = fgetc(ifp);
                if(!isdigit(current))
                {
                    if(isalpha(current))
                    {
                        // alpha-num error handling
                        printf("\nERR: Invalid alphanumeric combination\n");
                        Return;
                    }
                Else if(!isspace(counter))
                    no_scan = 1;
                found = 1;
                }
            }
            num = atoi(string);
            
            if(num > 65535){
                printf("\nERR: Integer Value larger than 65,535 detected\n");
                Return;
            }
            printf("%d\t%d\n", num, numbersym);

            // reinitialize string & counter
            for(i = 0; i < 12; i++)
                string[i] = '\0';
            counter = 0;
        }
    }
=======
                Else
                    printf("%s\t%d\n", string, identsym);
                
                // reinitialize string & counter
                for(i = 0; i < 12; i++)
                    string[i] = '\0';
                counter = 0;
                reserved = 0;
            }
            
            // scans in And prints integers
            // should print error If alphabetical char comes right after an integer char
            if(isdigit(current))
            {
                while(found == 0 && counter < 6)
                {
                    string[counter] = current;
                    counter++;
                    current = fgetc(ifp);
                    if(!isdigit(current))
                    {
                        if(isalpha(current))
                        {
                            // alpha-num error handling
                            printf("\nERR: Invalid alphanumeric combination\n");
                            Return;
                        }
                        Else if(!isspace(counter))
                            no_scan = 1;
                        found = 1;
                    }
                }
                num = atoi(string);
                
                if(num > 65535){
                    printf("\nERR: Integer Value larger than 65,535 detected\n");
                    Return;
                }
                printf("%d\t%d\n", num, numbersym);
                
                // reinitialize string & counter
                for(i = 0; i < 12; i++)
                    string[i] = '\0';
                counter = 0;
            }
        }
        
        Else
            length = 0;
>>>>>>> 632cad1a5dc69676303b2d67151f2202ddbc099d
    }
}
Reply