SIMPLE TEXT FORMAT 1.0.2


NAME
SYNOPSIS
OPTIONS
DESCRIPTION
SOURCE
LICENSE
VERSION
AUTHOR

NAME

	stf

SYNOPSIS

	stf [-h headerfile] [-f footerfile] [-i] [-t] [-u] [-l] [-s] filename 

OPTIONS

	-h headerfile		insert header file before stf output
	-f footerfile		insert footer file after stf output
	-i			include table of contents
	-t			include timestamp after stf output
				before footer file
	-u			convert doc- & subtitles to uppercase
	-l			convert doc- & subtitles to lowercase
	-s			only parse the TOC for a file (beta)
	-v			display version information and exit

DESCRIPTION

Simple Text Format (STF) is a simple set of rules for parsing plain text files to create html output. STF was initially setup to be able to parse readable text files to html, mainly for manuals and tutorials.

Note that each line is restricted to 200 character max. All characters after the 199th are discarded.

STF 'knows' 6 types of content for now. These content types are:

For all these types of content STF has some rules which should be followed: (Note that STF looks at the first character of a line to determine the content type of the current line. STF considers a character other than [tab][newline][-][+] as a 'normal character'.)

	Document title
	- starts with the first line of the document
	- ends when a blank line is encountered
	
	Sub title
	- starts when the first character of a line is a normal character
	  and is preceeded by more than one blank line
	- ends when a blank line is encountered
	
	Text paragraph
	- starts when the first character of a line is a normal character
	  and is preceeded by exactly one blank line
	- ends when a blank line is encountered
	
	Code paragraph
	- starts when the first character of a line is a [tab] character
	- ends when a blank line is encountered
	
	List items
	- starts when the first character of a line is a [-] character
	  and is preceeded by one or more blank lines
	- ends when a blank line is encountered
	
	Include file
	- starts when the first character of a line is a [+] character
	  and is preceeded by one or more blank lines
	- ends immediately

When a content type is started, while one of the above rules apply for a line, STF will continue with that content type untill a blank line is encountered. This means that the first character of a line looses its STF meaning inside a content block.

The [+] character is a special type of content. All characters directly after the [+] character are seen by STF as a filename. STF opens the file and includes the complete content in the output. All the [<] and [>] characters are replaced with their html entities, respectively < and >

Also all [<] and [>] characters in lines which belong to a code paragraph are replaced with their html entities.

All content types have default html tags bind to them:

	CONTENT TYPE		DEFAULT HTML TAG
	document title		<h1></h1>
	sub title		<h2></h2>
	text paragraph		<p></p>
	code paragraph		<pre></pre>
	list items		<ul><li></ul>
	include file		<i><pre></pre></i>

However all html tags can be customized. Create a STF configuration file with the name 'stf.conf' in the /etc/ directory. The format of the configuration file looks like this:

	IDENTIFIER=HTMLTAG

Identifiers are:

	sd=Start document title
	ed=End document title
	ss=Start sub title
	es=End sub title
	st=Start text paragraph
	et=End text paragraph
	sc=Start code paragraph
	ec=End code paragraph
	sl=Start list
	el=End list
	sb=List item
	si=Start include file
	ei=End include file
	sa=Start anchor TOC
	ea=End anchor TOC
	so=Start TOC
	eo=End TOC
	sv=Version Information

Note that maximal number of characters for a html tag is restricted to 256.

Here is an example file:


sv=

SOURCE

Download stf-1.0.tar here (source and manual included) or view the source here.


/*****************************************************************************************************************
 SIMPLE TEXT FORMAT
 Version:	1.01
 Date:          28 July 2000
 Author:        Fred Wijnsma
 Email address:	wijnsma@yahoo.com
 Homepage:      www.hacom.nl/~wijnsma/stf/

 STILL TODO AND BUGS TO BE FIXED
 - when a content type has started in some situations it will be changed to another content type without a blank
   line, this should be fixed !
 - create a -d argument which also outputs (probably to a file as secondary output) debugging information
   for development purposes
 - re-code the argument parsing with the getopt function and follow the GNU argument convention
 - create a filename convention which makes it possible to create "index" pages based on a part of the filename
   document title and the sub titles
 - create a Makefile which will installs stf and the manual page
 - create a package for download with:
   - stf.c	-> source file
   - stf	-> compiled version
   - stf.txt	-> example plain text file
   - stf.html	-> example output file
   - stf.conf	-> example configuration file
   - stf.man	-> manual page
   - README	-> readme file
   - INSTALL	-> installation instructions
   - do_stf.sh	-> example shell script
*****************************************************************************************************************/

#include <stdio.h>
#include <string.h>
#include <time.h>
#include <ctype.h>

/*****************************************************************************************************************
 VERSION INFORMATION (only used internal)
*****************************************************************************************************************/
char version[256]     = "*!STF version 1.0.2 (c) 2000 QaD!*";           /* Version information, not used        */

/*****************************************************************************************************************
 DEFAULT VARIABLES CHANGE WHEN NEEDED
*****************************************************************************************************************/
char sd[256]          = "<h1>";                                         /* Document title start                 */
char ed[256]          = "</h1>";                                        /* Document title end                   */
char ss[256]          = "<h2>";                                         /* Sub title start                      */
char es[256]          = "</h2>";                                        /* Sub title end                        */
char st[256]          = "<p>";                                          /* Text paragraph start                 */
char et[256]          = "</p>";                                         /* Text paragraph end                   */
char sc[256]          = "<pre>";                                        /* Code paragraph start                 */
char ec[256]          = "</pre>";                                       /* Code paragrap end                    */
char sl[256]          = "<ul>";                                         /* List start                           */
char el[256]          = "</ul>";                                        /* List end                             */
char sb[256]          = "<li>";                                         /* List item                            */
char si[256]          = "<pre><i>";                                     /* Include start                        */
char ei[256]          = "</i></pre>";                                   /* Include end                          */
char sa[256]          = "";                                             /* Start anchor (TOC)                   */
char ea[256]          = "";                                             /* End anchor (TOC)                     */
char so[256]          = "<hr>";                                         /* Start TOC                            */
char eo[256]          = "<hr>";                                         /* End TOC                              */

/*****************************************************************************************************************
 NO CHANGES SHOULD BE MADE BELOW THIS LINE
*****************************************************************************************************************/
FILE *inputfile;                                                        /* File to process                      */
FILE *incfile;                                                          /* Include file                         */
FILE *insertfile;                                                       /* Include file                         */
FILE *tocfile;                                                          /* Table of contents file               */

int firstline        = 0;                                               /* If 1 first line is parsed            */
int linetype         = 0;                                               /* Type of line                         */
int prev_linetype    = 0;                                               /* Previous type of line                */
int counter          = 0;                                               /* Number of blank lines                */
int incheader        = 0;                                               /* Include header when 1                */
int incfooter        = 0;                                               /* Include footer when 1                */
int inctoc           = 0;                                               /* Include TOC when 1                   */
int inctime          = 0;                                               /* Display timestamp                    */
int sitemap          = 0;                                               /* Parse sitemap information            */
int charcase         = 0;                                               /* Char case, 1 upper, 2 lower          */

char filename[256]   = "";                                              /* Filename to parse                    */
char configname[256] = "/etc/stf.conf";                                 /* Configuration filename to parse      */
char lines[1024]     = "";                                              /* Line to parse                        */
char toclines[1024]  = "";                                              /* Line to parse for TOC                */
char codeline[1024]  = "";                                              /* Line parsed and html replaced        */
char headername[256] = "";                                              /* Header filename to process           */
char footername[256] = "";                                              /* Footer filename to process           */
char thetime[1024]   = "";                                              /* Time of formatting                   */
char doctitle[1024]  = "";                                              /* Document title for sitemap           */


/*****************************************************************************************************************
 PARSING THE ARGUMENTS
 Note that this function should actually be rewritten to use of the "getopt" (3) function
 Also the convention as specified on the GNU site should be followed (e.g. long and short argument notation)
*****************************************************************************************************************/
int parse_args(int argcount, char *arglist[]) {
  int i       = 0;
    if (argcount < 2) {                                                 /* No arguments exit                    */
      printf("Usage: %s [-v] [-h headerfile] [-f footerfile] [-i] [-t] [-u] [-l] [-s] filename\n", arglist[0]);
      exit(1);
    }
    for (i = 1; i <= argcount - 1; i++) {
      if ( strcmp(arglist[i], "-v") == 0) {                             /* Display version information          */
          printf("%s", version);
          exit(2);
      } else if (i == argcount -1) {
        strcpy(filename, arglist[i]);                                   /* Filename                             */
      } else if ( strcmp(arglist[i], "-h") == 0) {                      /* Header filename                      */
        i++;
        strcpy(headername, arglist[i]);
        incheader = 1;
      } else if ( strcmp(arglist[i], "-f") == 0) {                      /* Footer filename                      */
        i++;
        strcpy(footername, arglist[i]);
        incfooter = 1;
      } else if ( strcmp(arglist[i], "-i") == 0) {                      /* Include TOC                          */
        inctoc = 1;
      } else if ( strcmp(arglist[i], "-t") == 0) {                      /* Include timestamp                    */
        inctime = 1;
      } else if ( strcmp(arglist[i], "-s") == 0) {                      /* Only create sitemap information      */
        sitemap = 1;
      } else if ( strcmp(arglist[i], "-u") == 0) {                      /* Convert doc/sub titles to uppercase  */
        charcase = 1;
      } else if ( strcmp(arglist[i], "-l") == 0) {                      /* Convert doc/sub titles to lowercase  */
        charcase = 2;
      } else {
        printf("Usage: %s [-v] [-h headerfile] [-f footerfile] [-i] [-t] [-u] [-l] [-s] filename\n",arglist[0]);
        exit(1);
      }
    }
}

/*****************************************************************************************************************
 CREATE TABLE OF CONTENTS
 Table of Contents is parsed based on the sub_titles
*****************************************************************************************************************/
void parse_toc() {
  int validtoc   = 0;
  int toccounter = 0;
  tocfile = fopen(filename, "r");
  if(tocfile == NULL) {
    printf("Cannot open file %s for parsing the TOC\n", filename);
    exit(1);
  }
  /* printf("%s\n",so); */
  while ( (fgets(toclines, 1024, tocfile)) ) {
    if (toclines[0] == '\n') {
      toccounter++;
    } else {
      if (toccounter > 1) {
        if (validtoc == 0) {
          printf("%s\n",so);
          validtoc = 1;
        }
        toclines[strlen(toclines)-1] = '\0';
        change_case(3,toclines);
        /* printf("<a href=\"#%s\">%s%s%s</a><br>\n",toclines,sa,toclines,ea); */
        toccounter = 0;
      } else {
        toccounter = 0;
      }
    }
  }
  if (validtoc != 0) {
    printf("%s\n",eo);
  }
  toccounter = 0;
}

/*****************************************************************************************************************
 DETERMINE CONTENT TYPES
 Function which determines the type of line we are dealing with based on the first character of the line
 Types can be either one of the following:
 0 = blank line
 1 = text line
 2 = code title
 3 = document title line (started in the main function)
 4 = sub title line
 5 = list line
 6 = include file line
*****************************************************************************************************************/
int determine_type(char myline[1024]) {
  switch (myline[0]) {
    case 10:                                                            /* Blank line                           */
      linetype = 0;
      break;
    case '\t':                                                          /* Code line                            */
      linetype = 2;
      break;
    case '-':                                                           /* List line                            */
      linetype = 5;
      break;
    case '+':                                                           /* Include file                         */
      linetype = 6;
      break;
    default:
      if (counter > 1) {
        linetype = 4;                                                   /* Subtitle line                        */
      } else {
        linetype = 1;                                                   /* Text line                            */
      }
    }
  return(linetype);
}

/*****************************************************************************************************************
 PARSING THE CONFIGURATION FILE
 First check "/etc/", than see if there is a "-f" argument for a configuration file. If the file cannot be found
 STF will use the defaults specified in "DEFAULT VARIABLES"
*****************************************************************************************************************/
int parse_conf() {
  FILE *configfile;                                                     /* Configuration file                   */
  char configname[256]    = "/etc/stf.conf";                            /* Configuration filename to parse      */
  char confline[256]      = "";                                         /* Line to parse for conf file          */
  configfile = fopen(configname, "r");
  if(configfile == NULL) {
    /* When we cant't find the configuration file, we will use the defaults                                     */
    /* printf("Error parsing configuration file (%s)\n", configname);                                           */
  } else {
    while ( (fgets(confline, 256, configfile)) ) {
      switch (confline[0]) {
        case 's':                                                       /* Line starts with a "s", start tag    */
          switch (confline[1]) {
            case 'd':                                                   /* Start Document title (sd)            */
              strncpy(sd,&(confline[3]),256);
              sd[strlen(sd)-1] = '\0';
              break;
            case 's':                                                   /* Start Sub title (ss)                 */
              strncpy(ss,&(confline[3]),256);
              ss[strlen(ss)-1] = '\0';
              break;
            case 't':                                                   /* Start Text (st)                      */
              strncpy(st,&(confline[3]),256);
              st[strlen(st)-1] = '\0';
              break;
            case 'c':                                                   /* Start Code (sc)                      */
              strncpy(sc,&(confline[3]),256);
              sc[strlen(sc)-1] = '\0';
              break;
            case 'l':                                                   /* Start List (sl)                      */
              strncpy(sl,&(confline[3]),256);
              sl[strlen(sl)-1] = '\0';
              break;
            case 'b':                                                   /* List Item (sb)                       */
              strncpy(sb,&(confline[3]),256);
              sb[strlen(sb)-1] = '\0';
              break;
            case 'i':                                                   /* Start Include File (si)              */
              strncpy(si,&(confline[3]),256);
              si[strlen(si)-1] = '\0';
              break;
            case 'a':                                                   /* Start Anchor (sa)                    */
              strncpy(sa,&(confline[3]),256);
              sa[strlen(sa)-1] = '\0';
              break;
            case 'v':                                                   /* Version Information (version)        */
              strncpy(version,&(confline[3]),256);
              version[strlen(version)-1] = '\0';
              break;
          }
          break;
        case 'e':                                                       /* Line starts with a "e", end tag      */
          switch (confline[1]) {
            case 'd':                                                   /* End Document title (ea)              */
              strncpy(ed,&(confline[3]),256);
              ed[strlen(ed)-1] = '\0';
              break;
            case 's':                                                   /* End Sub title (es)                   */
              strncpy(es,&(confline[3]),256);
              es[strlen(es)-1] = '\0';
              break;
            case 't':                                                   /* End Text (et)                        */
              strncpy(et,&(confline[3]),256);
              et[strlen(et)-1] = '\0';
              break;
            case 'c':                                                   /* End Code (ec)                        */
              strncpy(ec,&(confline[3]),256);
              ec[strlen(ec)-1] = '\0';
              break;
            case 'l':                                                   /* End List (el)                        */
              strncpy(el,&(confline[3]),256);
              el[strlen(el)-1] = '\0';
              break;
            case 'i':                                                   /* End Include File (ei)                */
              strncpy(ei,&(confline[3]),256);
              ei[strlen(ei)-1] = '\0';
              break;
            case 'a':                                                   /* End Anchor (es)                      */
              strncpy(ea,&(confline[3]),256);
              ea[strlen(ea)-1] = '\0';
              break;
            }
          break;
        }
      }
    fclose(configfile);
  return(0);
  }
}

/*****************************************************************************************************************
 CONTENT TYPE END BLOCK
 Print end of a content block based on the current content type
 Types can be either one of the following:
 0 = blank line (cannot be ended %)
 1 = text line
 2 = code title
 3 = document title line (started in the main function)
 4 = sub title line
 5 = list line
 6 = include file line
*****************************************************************************************************************/
int end_block(int mytype, char myline[1024]) {
  /*        int i           = 0; */
  switch (mytype) {
    case 1:                                                             /* End txt block                        */
      lines[strlen(lines)-1] = '\0';
      printf("%s",lines);
      printf("%s\n",et);
      break;
    case 2:                                                             /* End code block                       */
      lines[strlen(lines)-1] = '\0';
      printf("%s", lines);
      printf("%s\n",ec);
      break;
    case 3:                                                             /* End document title                   */
      lines[strlen(lines)-1] = '\0';
      change_case(1,lines);
      printf("%s\n",ed);
      if (inctoc == 1) {
        parse_toc();
      }
      break;
    case 4:                                                             /* End subtitle block                   */
      lines[strlen(lines)-1] = '\0';
      change_case(1,lines);
      printf("\n%s\n",es);
      break;
    case 5:                                                             /* End list items block                 */
      lines[strlen(lines)-1] = '\0';
      printf("%s",lines);
      printf("%s\n",el);
      break;
    case 6:                                                             /* End incude file                      */
      break;
  }
  return(0);
}

/*****************************************************************************************************************
 CONTENT TYPE START BLOCK
 Print start of a content block based on the current content type
 Types can be either one of the following:
 0 = blank line (cannot be started %)
 1 = text line
 2 = code title
 3 = document title line (started in the main function)
 4 = sub title line
 5 = list line
 6 = include file line
*****************************************************************************************************************/
int start_block(int mytype, char myline[1024]) {
  int i = 0;
  char incfile[256];
  switch (mytype) {
    case 1:                                                             /* Start txt block                      */
      printf("%s\n",st);
      printf("%s",lines);
      break;
    case 2:                                                             /* Start code block                     */
      printf("%s\n",sc);
      replace_html(lines);
      break;
    case 3:                                                             /* Start document title                 */
      printf("%s\n",sd);
      change_case(1,lines);
      break;
    case 4:                                                             /* Start subtitle block                 */
      lines[strlen(lines)-1] = '\0';
      printf("%s\n",ss);
      change_case(2,lines);
      break;
    case 5:                                                             /* Start list items block               */
      lines[0] = ' ';
      printf("%s\n%s",sl,sb);
      printf("%s",lines);
      break;
    case 6:                                                             /* Start include file block             */
      strcpy(incfile,"");
      for (i = 1; i < strlen(lines); i++) {
        strncat(incfile, &lines[i], 1);
      }
      include_file(incfile);
      break;
  }
  return(0);
}

/*****************************************************************************************************************
 REPLACE HTML INTERPRETED CHARACTERS WITH THEIR CORRESPONDING ENTITIES
 For now only the < and > characters are replaced
*****************************************************************************************************************/
int replace_html(char myline[1024]) {
  int i = 0;
  strcpy(codeline, "");
  for (i = 0; i < strlen(lines); i++) {
    if (lines[i] == '>') {
      strcat(codeline, ">");
    } else if (lines[i] == '<') {
      strcat(codeline, "<");
    } else {
      strncat(codeline, &(lines[i]), 1);
    }
  }
  printf("%s", codeline);
  return 0;
}

/*****************************************************************************************************************
 CHANGE CASE OF CHARACTERS
 If "charcase" matches:
 0 -> do not change case state
 1 -> change to uppercase
 2 -> change to lowercase
 If "state" matches:
 1 -> format line appearance as document title
 2 -> format line appearance as sub title
 3 -> format line appearance as TOC item
*****************************************************************************************************************/
int change_case(int state, char caseline[1024]) {
  int i;
  switch (charcase) {
    case 0:
      strcpy(lines,caseline);
      break;
    case 1:
      for(i=0;i<strlen(caseline);i++)
        caseline[i] = toupper(caseline[i]);
      break;
    case 2:
      for(i=0;i<strlen(caseline);i++)
        caseline[i] = tolower(caseline[i]);
      break;
  }
  switch (state) {
    case 1:
      printf("%s",caseline);
      break;
    case 2:
      printf("<a name=\"%s\">%s</a>",caseline,caseline);
      break;
    case 3:
      printf("<a href=\"#%s\">%s%s%s</a><br>\n",caseline,sa,caseline,ea);
      break;
  }
  return 0;
}

/*****************************************************************************************************************
 INCLUDE FILE
 Files included with the -h and -f arguments are included as is, no additional formatting is done
*****************************************************************************************************************/
int include_file(char fname[256]) {
  printf("%s\n",si);
  fname[strlen(fname)-1] = '\0';
  incfile = fopen(fname, "r");
  if(incfile == NULL) {
    printf("Error parsing include file (%s)\n",fname);
  } else {
    while ( (fgets(lines, 1024, incfile)) ) {
      replace_html(lines);
    }
  }
  printf("%s\n",ei);
}

/*****************************************************************************************************************
 INCLUDE FILE
 Files included with the + content type are formatted so that every appearance of the < and > character gets
 replaced by the < and > html entities
*****************************************************************************************************************/
void insert_file(char insertname[256]) {
  insertfile = fopen(insertname, "r");
  if (insertfile == NULL) {
    printf("<i>Error including file (%s)\n", insertname);
  } else {
    while ( (fgets(lines, 1024, insertfile)) ) {
      printf("%s", lines);
    }
  }
}

int main(int argc, char *argv[])  {

  time_t timer;
  timer=time(NULL);
  strcpy(thetime, asctime(localtime(&timer)));
  thetime[strlen(thetime)-1] = '\0';

  parse_args(argc,argv);

  parse_conf();

  if (incheader == 1) insert_file(headername);

  inputfile = fopen(filename, "r");
  if(inputfile == NULL) {
    printf("Cannot open file %s (main file)\n", filename);
    exit(1);
  }
  while ( (fgets(lines, 1024, inputfile)) ) {
    if (firstline == 0) {
      linetype = 3;
      firstline = 1;
    } else {
      linetype = determine_type(lines);
    }
    if (linetype == 0) {                                    /* Current line is blank line           */
      counter++;
      if (prev_linetype == 0) {                       /* Previous line is blank line          */
        printf("%s",lines);
      } else {                                        /* Previous line is _not_ blank line    */
        end_block(prev_linetype,lines);
      }
    } else {                                                /* Current line is _not_ blank line     */
      if (prev_linetype == 0) {                       /* Previous line is blank line          */
        start_block(linetype,lines);
        counter = 0;
      } else {                                        /* Previous line is _not_ blank line    */
        linetype = prev_linetype;               /* Note we should keep the linetype     */
          if (linetype == 2) {                    /* regardless what the start char is    */
            replace_html(lines);
            counter = 0;
          } else if (linetype == 5) {
            lines[0] = ' ';
            printf("%s%s",sb,lines);
            counter = 0;
          } else {
            printf("%s",lines);
            counter = 0;
          }
       }
    }
    prev_linetype = linetype;                               /* Set PREV_TYPE to TYPE for next run   */
  }
  switch (linetype) {                                             /* Close the content neatly when no     */
    case 1:                                                 /* blank is last line of the file to    */
      printf("%s\n",et);                              /* parse                                */
      break;
    case 2:
      printf("%s\n",ec);
      break;
    case 3:
      printf("%s\n",ed);
      break;
    case 4:
      printf("%s",es);
      break;
    case 5:
      printf("%s\n",el);
      break;
    case 6:
      printf("%s\n",ei);
      break;
  }
  if (inctime == 1) printf("<p align=\"right\"><font size=\"1\">Formatted %s %s</font></p>\n", thetime,version);
  if (incfooter == 1) insert_file(footername);
  return(0);
}

LICENSE

Free to use, modify or delete :)

VERSION

STF Version 1.0 QaD (c) 2000

AUTHOR

Fred Wijnsma [wijnsma@hacom.nl]

Formatted Sun Jul 30 01:22:36 2000