This file lists projects still to be done for the GNU Fortran system.
Copyright (C) 1995 Free Software Foundation, Inc.  You may copy,
distribute, and modify it freely as long as you preserve this copyright
notice and permission notice.  Contributed by James Craig Burley
(burley@gnu.ai.mit.edu).

1995-11-16

0. Improved efficiency.

Don't bother doing any performance analysis until most of the
following items are taken care of, because there's no question
they represent serious space/time problems, although some of
them show up only given certain kinds of (popular) input.

* Improve malloc package and its uses to specify more info about
  memory pools and, where feasible, use obstacks to implement them.

* Skip over uninitialized portions of aggregate areas (arrays,
  COMMON areas, EQUIVALENCE areas) so zeros need not be output.
  This would reduce memory usage for large initialized aggregate
  areas, even ones with only one initialized element.

* Prescan the statement (in sta.c) so that the nature of the statement
  is determined as much as possible by looking entirely at its form,
  and not looking at any context (previous statements, including types
  of symbols).  This would allow ripping out of the statement-
  confirmation, symbol retraction/confirmation, and diagnostic inhibition
  mechanisms.  Plus, it would result in much-improved diagnostics.  For
  example, "CALL some-intrinsic(...)", where the intrinsic is not a
  subroutine intrinsic, would result actual error instead of the
  unimplemented-statement catch-all.

* Throughout g77, don't pass ffewhereLine/ffewhereColumn pairs where
  a simple ffewhere type, which points to the error as much as is
  desired by the configuration, will do, and don't pass ffelexToken types
  where a simple ffewhere type will do.  Then, allow new default
  configuration of ffewhere such that the source line text is not
  preserved, and leave it to things like EMACS' next-error function
  to point to them (now that next-error supports column numbers).
  The change in calling sequences should improve performance somewhat,
  as should not having to save source lines.  It might even be possible
  to change ffewhere from a pointer to a single 32-bit item that has
  24 bits for line#, 8 bits for col#, or something like that, if it's
  worthwhile for performance' sake at that point.  It might also be
  worthwhile to make it easy to configure away preservation of column
  numbers if that might make g77 faster, though with most Fortran
  programs, column numbers are quite helpful.  (Whether this whole
  item will improve performance is questionable, but it should greatly
  improve maintainability.)

* Handle DATA (A(I),I=1,1000000)/1000000*2/ more efficiently, especially
  as regards the assembly output.  Some of this might require improving
  the back end, but lots of improvement in space/time required in g77
  itself can be fairly easily obtained without touching the back end.
  Maybe type-conversion, where necessary, can be speeded up as well in
  cases like the one shown (converting the "2" into "2.").

* If analysis shows it to be worthwhile, optimize lex.c.

1. Better optimization.

* Get the back end to produce at least as good code involving array
  references as does f2c+gcc.  (NOTE: 0.5.16 seems to have improved
  this, at least based on preliminary feedback during alpha testing.
  Please provide detailed information on cases where it doesn't, for
  possible future improvements.  Apparently the improvement works
  only as of gcc-2.7.0; it doesn't kick in for 2.6.3, for example.
  Further analysis shows that cases where the improvement doesn't
  occur include those involving 3-dimensional arrays, for example.)

* Do the equivalent of the trick of putting "extern inline" in front
  of every function definition in libf2c and #include'ing the resulting
  file in f2c+gcc -- that is, inline all run-time-library functions
  that are at all worth inlining.  (Some of this has already been
  done, e.g. for integral exponentiation.)

* Provide some way, a la gcc, for Fortran code to specify assembler
  code.

* When doing CHAR_VAR = CHAR_FUNC(...), and it's clear that types line up
  and CHAR_VAR is addressable or not a VAR_DECL, make CHAR_VAR, not a
  temporary, be the receiver for CHAR_FUNC.  (This is now done for
  COMPLEX variables.)

* Design and implement Fortran-specific optimizations that don't
  really belong in the back end, or where the front end needs to
  give the back end more info than it currently does.

* Design and implement a new run-time library interface, with the
  code going into libgcc so no special linking is required to
  link Fortran programs using standard language features.  This library
  would speed up lots of things, from I/O (using precompiled formats,
  doing single or small #s of calls for arrays or array sections, and
  so on) to general computing (array/section implementations of
  various intrinsics, implementation of commonly performed loops that
  aren't likely to be optimally compiled otherwise, etc.).  Among
  the important things the library would do are: be a one-stop-shop-type
  library, hence shareable and usable by all, in that what are now
  library-build-time options in libf2c would be moved at least to the
  g77 compile phase, if not to finer grains (such as choosing how
  list-directed I/O formatting is done by default at OPEN time, for
  preconnected units via options or even statements in the main program
  unit, maybe even on a per-I/O basis with appropriate pragma-like
  devices).

* Probably requiring the new library design, change interface to
  normally have COMPLEX functions return their values in the way
  gcc would if they were declared complex float, rather than using
  the mechanism currently used by CHARACTER functions (whereby the
  functions are compiled as returning void and their first arg is
  a pointer to where to store the result).  Don't append underscores on
  external names for COMPLEX functions in some cases once g77 uses
  gcc rather than f2c calling conventions.

* Do something useful with "doiter" references where possible.  E.g.
  CALL FOO(I) cannot modify I if within a DO loop that uses I as the
  iteration variable, and the back end might find that info useful
  in determining whether it needs to read I back into a register after
  the call.  (It normally has to do that, unless it knows FOO never
  modifies its passed-by-reference argument, which is rarely the case
  for F77 code.)

2. Simpler porting.

* A new library (see above) should improve portability as well as
  produce more optimal code.  Further, g77 and the new library should
  conspire to simplify naming of externals, such as by removing unnecessarily
  added underscores, and to reduce/eliminate the possibility of naming
  conflicts, while making debugger more straightforward.  Also, it should
  make multi-language applications more feasible, such as by providing
  Fortran intrinsics that get Fortran unit numbers given C FILE *
  descriptors.

* Possibly related to a new library, g77 should produce the equivalent
  of a gcc "main(argc, argv)" function when it compiles a main program
  unit, instead of compiling something that must be called by a library
  implementation of main().  This would do many useful things such as
  provide more flexibility in terms of setting up exception handling,
  not requiring programmers to start their debugging sessions with
  "breakpoint MAIN__" followed by "run", and so on.

* The back end needs to understand the difference between alignment
  requirements and desires.  E.g. on x86 machines, g77 currently imposes
  overly strict alignment requirements, due to the back end, but it
  would be useful for Fortran and C programmers to be able to override
  these _recommendations_ as long as they don't violate the actual
  processor _requirements_.

3. More extensions.

* Support INTEGER/REAL/COMPLEX equivalents for all applicable back-end-
  supported types (char, short int, int, long int, long long int, and long
  double).  This means providing intrinsic support &c as well, and for most
  machines will result in automatic support of INTEGER*1, INTEGER*2,
  INTEGER*8, and so on.

* Provide as the default source-line model a "pure visual" mode, where
  the interpretation of a source program in this mode can be accurately
  determined by a user looking at a traditionally displayed rendition
  of the program (assuming the user knows whether the program is fixed
  or free form).  That is, assume the user cannot tell tabs from spaces
  and cannot see trailing spaces on lines, but has canonical tab stops
  and, for fixed-form source, has the ability to always know exactly
  where column 72 is.  Then provide common alternate models (Digital, f2c,
  &c) via command-line options.  This includes allowing arbitrarily long
  lines for free-form source as well as fixed-form source and providing
  pedantic limits and diagnostics as appropriate, plus even on a non-
  tabbed fixed-form line, treating a line with the first non-blank character
  starting with column 6 being a digit as a continuation line (to effect
  the "<TAB>1continuationline..." behavior in "pure visual" mode --
  actually, g77 already does this).

* Intrinsics in constant expressions.  This, plus F90 intrinsics such
  as SELECTED_INT_KIND, would give users the ability to write clear,
  portable code.

* Automatic adjustable arrays.

* Support more general expressions to dimension dummy and automatic
  adjustable arrays, such as array element references, function
  references, etc.

* A FLUSH statement that does what many systems provide via CALL FLUSH,
  but that supports * as the unit designator (same unit as for PRINT).

* Finish support for V027 VXT PARAMETER statement (like PARAMETER in
  stc but type of destination is set from type of source expression).

* Consider adding a NUMERIC type to designate typeless numeric constants,
  named and unnamed.  The idea is to provide a forward-looking, effective
  replacement for things like the VXT PARAMETER statement when people
  really need typelessness in a maintainable, portable, clearly documented
  way.  Maybe TYPELESS would include CHARACTER, POINTER, and whatever
  else might come along.

* Allow DATA VAR/.../ to come before COMMON /.../ ...,VAR,....  Then again,
  maybe it is better to have g77 always require placement of DATA so that
  it can possibly immediately write constants to the output file, thus
  saving time and space?  That is, DATA A/1000000*1/ should perhaps always
  be immediately writable to canonical assembler, unless it's already known
  to be in a COMMON area following as-yet-uninitialized stuff, and to do
  this it cannot be followed by COMMON A.

* Character-type selector/cases for SELECT CASE.

* Option to initialize everything not explicitly initialized to "weird"
  (machine-dependent) values, e.g. NANs, bad (non-NULL) pointers, and
  "-0" integers.  Right now, only -finit-local-zero is supported, which
  initializes local vars to binary zeros.

* Add run-time bounds-checking of array/subscript references a la f2c.

* Output labels for use by debuggers that know how to support them.  Same
  with weirder things like construct names.  It is not yet known if any
  debug formats or debuggers support these.

* Provide necessary g77/gdb support to make better native Fortran-language
  debugging.

* Support the POSIX standard for Fortran.

* Support DEC-style lossage of virtual blanks at end of source line
  if some command-line option specified.  This affects cases where
  a character constant is continued onto the next line in a fixed-form
  source file -- g77, and many other compilers, virtually extend
  the continued line through column 72 with blanks that become part
  of the character constant, but DEC Fortran normally didn't.  (Fairly
  recently, at least one version of DEC Fortran was enhanced to provide
  the g77 behavior when a command-line option is specified, apparently due
  to demand from readers of the USENET group comp.lang.fortran.  It'd
  be nice to return the favor!)

* Consider a preprocessor designed specifically for Fortran to replace
  cpp -traditional.  There are several on the 'net to look at.

* Support OPEN(...,KEY=(...),...).

* OPEN(NOSPANBLOCKS,...) is treated as OPEN(UNIT=NOSPANBLOCKS,...), so a
  later UNIT= in the first example is invalid.  Make sure this is
  what DEC Fortran users expect.

* Currently we disallow READ(1'10) since it is an obnoxious syntax, but
  supporting it might be pretty easy if needed (more details needed, such
  as whether general expressions separated by an apostrophe are supported,
  or maybe the record number can be a general expression, &c).

* Support STRUCTURE/UNION/MAP/RECORD fully.  Currently no support at all
  for %FILL in STRUCTURE and related syntax, whereas the rest of the
  stuff has at least some parsing support.  This requires either major
  changes to libf2c or its replacement.

* F90 and g77 probably disagree about label scoping relative to INTERFACE/
  END INTERFACE and their contained SUBROUTINE/FUNCTION interface bodies
  (blocks?).

* F90: ENTRY doesn't support RESULT() yet, since that was added after S8.112.

* F90: Empty-statement handling (10 ;;CONTINUE;;) probably isn't consistent
  with the final form of the standard (it was vague at S8.112).

* It seems to be an "open" question whether a file, immediately after being
  OPENed, is positioned at the beginning, the end, or wherever -- it might
  be nice to offer an option of opening to "undefined" status, requiring
  an explicit absolute-positioning operation to be performed before any
  other (besides CLOSE) to assist in making applications port to systems
  (some IBM?) that OPEN to the end of a file or some such thing.

4. Generalize the machine model.

* Switch to using REAL_VALUE_TYPE to represent REAL/DOUBLE constants
  exclusively so the target float format need not be required.  This
  means changing the way g77 handles initialization of aggregate areas
  having more than one type, such as REAL and INTEGER, because currently
  it initializes them as if they were arrays of "char" and uses the
  bit patterns of the constants of the various types in them to determine
  what to stuff in elements of the arrays.

* Rely more and more on back-end info and capabilities, especially in the
  area of constants (where having the g77 front-end's IL just store
  the appropriate tree nodes containing constants might be best).

* Suite of C and Fortran programs that a user/administrator can run on a
  machine to help determine the configuration for GNU Fortran before building
  and help determine if the compiler works (especially with whatever
  libraries are installed) after building.

5. Useful warnings.

* Have -Wunused warn about unused labels.

* Warn about assigned GOTO/FORMAT usage without any ASSIGN to variable
  (actually, use of `-O -Wuninitialized' should take care of most of these).

* Have -Wsurprising (or something -- not by default) warn about use of
  non-standard intrinsics without explicit INTRINSIC statements for them
  (to help find code that might fail silently when ported to another
  compiler).

* Support -fpedantic more thoroughly, and use it only to generate
  warnings instead of rejecting constructs outright.  Have it warn:
  if a variable that dimensions an array is not a dummy or placed
  explicitly in COMMON (the 77 standard does not allow it to be
  placed in COMMON via EQUIVALENCE); if specification statements
  follow statement-function-definition statements; about all sorts of
  syntactic extensions.

* Warn about modifying DO variables via EQUIVALENCE.  This test might
  be useful in setting the "doiter" flag for a variable or even array
  reference within a loop, since that might produce faster code someday.

* Warn if brain-damage auto-decimal-convert-constant-to-REAL*8
  feature might be expected in source (if such warnings are enabled); for
  example, warn in cases like "parameter (pi=3.14159);foo=pi*3d0;" because
  apparently in these and other cases, some compilers append decimal zeros
  to the original single-precision constant and converts the result to
  double-precision -- though undoubtedly it uses an easier equivalent
  implementation (and I suppose g77 could, too, if this kind of dangerous
  feature were actually more useful than just fixing the source).

6. Better documentation.

* Convert existing documentation into the format(s) used by gcc, for
  all the right reasons.

* Better info on how g77 works and how to port it.

7. Better internals.

* Generally make expression handling focus
  more on critical syntax stuff, leaving semantics to callers.  E.g.
  anything a caller can check, semantically, let it do so, rather
  than having expr.c do it.  (Exceptions might include things like
  diagnosing "FOO(I--K:)=BAR" where FOO is a PARAMETER -- if it seems
  important to preserve the left-to-right-in-source order of production
  of diagnostics.)

* Come up with better naming conventions for -D to establish requirements
  to achieve desired "language" via proj.h.

* In global, clean up used tokens and ffewheres in _terminate_1.

* Replace sta outpooldisp mechanism with malloc_pool_use.

* Check for opANY in more places in com.c, std.c, and ste.c, and get
  rid of the opCONVERT(opANY) kludge (after determining if there is
  indeed no real need for it).

* Utility to read and check bad.def msgs and their references in the
  code, to make sure calls are consistent with message templates.

* Make a symbol dumper for standalone FFE so testing can be more exhaustive.

* Search and fix "&ffe" and similar so that "ffe...ptr..." macros are
  available instead (a good argument for wishing we could have written all
  this stuff in C++, I suppose).

* Some modules truly export the member names of their structures (and the
  structures themselves), maybe fix this, and fix other modules that just
  appear to as well (by appending "_", though it'd be ugly and probably
  not worth the time).

* Implement C macros RETURNS(value) and SETS(something,value) in proj.h
  and use them throughout FFE source code so they can be tailored to catch
  code writing into a RETURNS() or reading from a SETS().

* Decorate throughout with "const" and other such stuff.

* All F90 notational derivations in the source code are still based
  on the S8.112 version of the draft standard.  Probably should update
  to the official standard, or put documentation of the rules as used
  in the code...uh...in the code.

* Some ffebld_new calls (those outside of ffeexpr or inside but invoked
  via paths not involving ffeexpr_lhs or ffeexpr_rhs) might be creating things
  in improper pools, leading to such things staying around too long or
  (doubtful, but possible and dangerous) not long enough.

* Some ffebld_list_new (or whatever) calls might not be matched by
  ffebld_list_bottom (or whatever) calls, which might someday matter.

* Probably not doing clean things when we fail to EQUIVALENCE something
  due to alignment/mismatch or other problems -- they end up without
  ffestorag objects, so maybe the backend (and other parts of the front
  end) can notice that and handle like an "opANY" (do what it wants, just
  don't complain or crash).  Most of this seems to have been addressed
  by now, but a code review wouldn't hurt.

8. Better diagnostics.

* Implement non-F90 messages (especially avoid mentioning F90 things g77
  doesn't yet support).  Much of this has been done as of 0.5.14.

* Generally continue processing for warnings and recoverable (user)
  errors whenever possible -- don't gratuitously make bad code.  Example:
  INTRINSIC ZABS;CALL FOO(ZABS);END when -ff2c-intrinsics-disable should
  complain about passing ZABS but still compile, instead of rejecting
  the entire CALL statement (some of this is related to improving sta.c
  to do the statement-preprocessing work).

* If -fno-ugly, reject badly designed trailing-radix quoted (typeless)
  numbers, such as '123'O.

* -Wugly*, -Wautomatic, -Wvxt-not-f90 (syn. -Wf90-not-vxt), -Wf90, and so
  on all should flag places (via diagnostics) where ambiguities
  are found.

* -Wconversion and related should flag places where non-standard
  conversions are found.  Perhaps much of this would be part of
  -Wugly*.

* When FUNCTION and ENTRY point types disagree (CHARACTER lengths,
  type classes, &c), ANY-ize the offending ENTRY point and any _new_ dummies
  it specifies.

* Complain when list of dummies containing an adjustable dummy array does
  not also contain every variable listed in the dimension list of the
  adjustable array.  Currently g77 does complain about a variable that
  dimensions an array but doesn't appear in any dummy list or COMMON area,
  but this needs to be extended to catch cases where it doesn't appear in
  every dummy list that also lists any arrays it dimensions.

* Make sure things like RETURN 2HAB are invalid in both source forms (must
  be RETURN (2HAB), which probably still makes no sense but at least can
  be reliably parsed).  Fixed form rejects it, but not free form, except
  in a way that is a bit difficult to understand.

* Speed up and improve error handling for data when repeat-count is
  specified; as in "integer x(20);continue;data (x(i),j=1,20)/20*5/;end",
  so 20 messages don't come out after the important one.

9. More library routines.

* The sort of routines usually found in the BSD-ish libU77 should be
  provided in addition to the few utility routines in libF77.  Some of
  this work has been done.
