Data-dependent bugs in TIP #457

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Data-dependent bugs in TIP #457

Christian Gollwitzer
Dear colleagues,

I am seeing data-dependent bugs in TIP457, both in the design as well as
in the current implementation, and I want to explain why I think this
should be fixed.

Sorry for the long text, but I have the feeling that my previous
concerns regarding this topic were not fully understood.

tl;dr: Please do not switch the interpretation of arguments based on the
string content of the actual argument.

1. *====== INTRODUCTION: Data Dependent Bugs ======================*

By data-dependent or data-driven bugs I mean a bug that is not exposed
during regular testing, but suddenly surfaces when a special input is
given to the program, be it accidentally or maliciously constructed.
Good (or rather bad) examples of this kind of bug are SQL injections,
unbraced exprs, or treating user input as a list. They are among the
worst possible bugs, because they remain undiscovered during regular
testing, even at 100% coverage.

Before coming to the concrete problem with TIP 457, let's have a look at
another example inside of Tcl.

2. *=========== Example: Data Dependent Bug in file & open ============*

The following code is a cut-down version of some real code that appeared
within the Tk file open dialog box:

   foreach f [glob *] {
      puts [file exists $f]
   }

This code contains a data-dependent bug which is non-obvious at first
glance. Regular testing does not reveal the bug:

   (test) 55 % ls
   /Users/chris/Sources/test:
   a   b
   (test) 56 % foreach f [glob *] { puts "$f [file exists $f]" }
   a 1
   b 1


This code fails, however, if an MS Word file is opened in the directory.
MS Word creates lock files with a name similar to this:

   (test) 57 % touch ~mist
   (test) 58 % foreach f [glob *] { puts "$f [file exists $f]" }
   a 1
   b 1
   ~mist 0


So glob enumerates a file which the file command ensemble is unable to
work on. Also "open" does not open this file returned by glob. To
correct the bug, the code has to be changed into:

        foreach f [glob *] { puts "$f [file exists ./$f]" }

The reason for this behaviour is a (mis-)feature, not a bug, in the file
handling of Tcl. The leading tilde ~ of the file ~mist is interpreted as
the specifier for "home directory", and the "./" is the quoting
mechanism to prevent this expansion. This means that ALL Tcl programs
which open a file from a variable name are buggy:

   (test) 98 % foreach f [glob *] { puts [open $f] }
   file16
   file17
   user "mist" doesn't exist

This code

        set fd [open $fn r]

is generally wrong, it should be

        set fd [open ./$fn r]

unless $fn is an absolute path starting with a slash, or a Windows drive
letter, in which case $fn would be correct. So the final correct way to
open a file, given the file name in $fn, is

        if {[string index $fn 0] eq "~"} {
           set fd [open ./$fn r]
        } else
           set fd [open $fn r]
        }


How often do you see this code, and how often do you see simply
       
        open $fd r

?

3. *============ Data dependent bug in TIP 457 =====================*

We have seen in the previous paragraph, that a data dependent behaviour
in core commands, in that case all file handling commands, leads to a
data dependent bug in the code that people write building upon these
commands. The programmer using "open" is completely *unaware* that there
is an invisible switch inside the core file commands, and simply relies
on them, that they open just the file with the given name. The root
problem is an invisible switch inside "open" & friends.

The problem with TIP 457 is that it contains an invisible switch, which
decides if a given actual argument is a name or an argument, depending
on the string of the actual argument. If it starts with "-" it is
(almost) unconditionally treated as a name, whereas if not, it is an
argument.

This gives rise to problems when named arguments are mixed with
positional arguments. Consider the following attempt to simulate a
subset of lsort:

   proc mylsort {{ordering -default ascii -switch {ascii dictionary
integer real}} list} {
        puts "Sort mode $ordering"
        lsort -$ordering $list
   }

It works with usual input:

   (test) 117 % set list {value some other thing}
   value some other thing
   (test) 118 % mylsort -dictionary $list
   Sort mode dictionary
   other some thing value

but fails when a dash is present [1]:

   (test) 119 % set list {-value some other thing}
   -value some other thing
   (test) 120 % mylsort -dictionary $list
   wrong # args: should be "mylsort
?|-ascii|-dictionary|-integer|-real|? list"

whereas the original lsort has no problem:

   (test) 121 % lsort -dictionary $list
   -value other some thing

It even breaks EIAS [1]:
       
   (test) 122 % lindex $list 0
   -value
   (test) 123 % mylsort -dictionary $list
   Sort mode dictionary
   -value other some thing

The remedy to this as per TIP 457 is the -- switch:

   (test) 124 % mylsort -dictionary -- $list
   Sort mode dictionary
   -value other some thing

This always works, but it puts the burden on the programmer to add -- at
all invocations. The current core "lsort" shows that the arguments are
never ambiguous, the assignment can be resolved by counting. The last
arg is the list, everything before is treated as an option. This
strategy always works if a fixed number of positional arguments is at
the beginning and/or end of the argument list. If core "lsort" were to
be implemented in TIP457, then ALL Tcl programs using lsort without the
-- switch would become buggy with a data dependent bug.

4. *=========== Possible solutions to the issue ====================*

In some cases, the assignment of an actual argument to the named or
positional parameters can be based on counting only. If there is a fixed
number of non-optional positional arguments at the beginning or the end
of the argument list, this can be assigned first and then the rest is
processed as options. Many core commands follow this strategy, e.g.
"lsort", "lsearch" or "canvas create object coordlist ?-options?". In
other cases it cannot be decided where to split between options and
values; -- is needed to separate the two. This is the case when both
parts, the named args and the positional args, are of variable length.

A possible solution would be first to detect, if the argument list can
be parsed unambiguously. In that case, create an interface like lsort.
Second, if it is determined, that there is ambiguity, this could raise
an error. Either the new proc-457 *disallows* mixing variable number of
arguments with named arguments, or it throws a runtime error *upon
invocation*, if "--" was not seen in the argument list. This would catch
the ambiguity early on, while the code is developed, not some time in
the future when accidentally a value with a leading dash is passed.

It might also be possible to come up with a set of rules, how to assign
the arguments in a way that the common cases do not bear a surprise.
While it might be tricky to derive these rules or prove that they work
as intended, it could be a way to formalise the sensible expectation. I
am still hoping that Kevin Kenny can find the rules he formulated some
time back.

In addition to making the code safer, the static binding of arguments
can also make the code faster. In general, string representations should
not be generated for data which is passed into a proc. The TIP code
already inspects the value to see if it is a list, but e.g. a huge
VecTcl array can not even have a strig representation, because it would
not fit into memory, but the current code would force it to compute the
string in many cases.

5. *================= Conclusion =================================

I hope that this long tractate will be read and contributes to the
development of TIP 457; all in al this TIP is not bad and there hase
been a lot of work going into the amalgamation of a syntax which seems
very Tclish and a good overall compromise. All I'm asking for is a way
to implement this in a way which does not tempt people into writing
buggy code.

        Christian

[1] Note: this one is tricky to reproduce, because of the history
caching / literal caching. It only works in a fresh tclsh.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Brad Lanam-2
On Sun, May 21, 2017 at 10:15 AM, Christian Gollwitzer <[hidden email]> wrote:
This code fails, however, if an MS Word file is opened in the directory.
MS Word creates lock files with a name similar to this:

   (test) 57 % touch ~mist
   (test) 58 % foreach f [glob *] { puts "$f [file exists $f]" }
   a 1
   b 1
   ~mist 0


And this is a real pain.
Something created a temporary folder named '~something' in %TEMP% that caused me problems.

I understand that the ~ prefix is supported across different systems so that Tcl works the same way
everywhere, but these system specific cultural differences cause problems like Christian has pointed
out.  Files named ~something are very rare in the unix world, and this is much less of an issue there.

To get back on topic, if the named argument code can be modified to do the right thing whenever
possible as Christian has suggested, that would be great.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Mathieu Lafon
In reply to this post by Christian Gollwitzer
Thanks Christian for this very detailed and interessant write-up.

> but fails when a dash is present [1]:
>
>    (test) 119 % set list {-value some other thing}
>    -value some other thing
>    (test) 120 % mylsort -dictionary $list
>    wrong # args: should be "mylsort
> ?|-ascii|-dictionary|-integer|-real|? list"
>
> whereas the original lsort has no problem:
>
>    (test) 121 % lsort -dictionary $list
>    -value other some thing
>
> It even breaks EIAS [1]:
>
>    (test) 122 % lindex $list 0
>    -value
>    (test) 123 % mylsort -dictionary $list
>    Sort mode dictionary
>    -value other some thing

Well spotted! It's because the internal type can change during time:
what is initially a string become a list after the [lindex] call, and
trigger the multi-elements test which end the named-group handling.
Testing if the string can be an array would be possible, but would
forbid the use of spaces in names.

I understand there is a coherency problem here that must be addressed.

> This always works, but it puts the burden on the programmer to add -- at
> all invocations.

Unix has several decades of history using an end-of-option marker to
end a named group to use an argument with a leading dash or to use an
external/unverified argument which may start with a dash. We are all
used to it, I don't think it is a burden.

> A possible solution would be first to detect, if the argument list can
> be parsed unambiguously. In that case, create an interface like lsort.

You have nearly convice me on that case (especially if we want to
replace similar functions without modifying the usage), although the
user's documentation will probably be a little more complicated. I
will think about it.

> Second, if it is determined, that there is ambiguity, this could raise
> an error. Either the new proc-457 *disallows* mixing variable number of
> arguments with named arguments, [...]

I don't think we should disallow that.

> [...] or it throws a runtime error *upon
> invocation*, if "--" was not seen in the argument list.

That would be the safer thing to do. *But* searching for "--" would
still require to stringify the argument list (although most non-string
objects can probably be easily skipped in that case).

> It might also be possible to come up with a set of rules, how to assign
> the arguments in a way that the common cases do not bear a surprise.
> While it might be tricky to derive these rules or prove that they work
> as intended, it could be a way to formalise the sensible expectation. I
> am still hoping that Kevin Kenny can find the rules he formulated some
> time back.

If your talking about rules which can prevent users to explicitely use
"--" for common cases, I'm interested to hear about them.

> [...] all in al this TIP is not bad and there hase
> been a lot of work going into the amalgamation of a syntax which seems
> very Tclish and a good overall compromise.

Thank you for your support.

-- Mathieu

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Peter da Silva-2


On 5/21/17, 5:49 PM, "Mathieu Lafon" <[hidden email]> wrote:
> Thanks Christian for this very detailed and interessant write-up.
>  
 >> but fails when a dash is present [1]:
 >>
 >>    (test) 119 % set list {-value some other thing}
 >>    -value some other thing
 >>    (test) 120 % mylsort -dictionary $list
 >>    wrong # args: should be "mylsort
 >> ?|-ascii|-dictionary|-integer|-real|? list"
 >>
 >> whereas the original lsort has no problem:
 >>
 >>    (test) 121 % lsort -dictionary $list
 >>    -value other some thing
 >>
 >> It even breaks EIAS [1]:
 >>
 >>    (test) 122 % lindex $list 0
 >>    -value
 >>    (test) 123 % mylsort -dictionary $list
 >>    Sort mode dictionary
 >>    -value other some thing
>
> Well spotted! It's because the internal type can change during time:
> what is initially a string become a list after the [lindex] call, and
> trigger the multi-elements test which end the named-group handling.
> Testing if the string can be an array would be possible, but would
> forbid the use of spaces in names.

What we are seeing here is the ad-hoc nature of argument parsing in traditional Tcl code. Since only one of the options is possible, there is no legal way for there to be an additional option after “-dictionary”, so lsort can handle that differently. If all your argument parsing is ad-hoc, you can look for special cases. When you’re implementing a general-purpose parser, you should pick some behavior and stick to it.

In the case of TIP#457, you can choose two possible behaviors:

1. Unknown options are treated as strings.
2. Unknown options are an error.

The designers of TIP#457 chose the latter. You can argue that it would be better to choose the former. I am ambivalent. Option 1 will allow more sloppy programs that have hidden option injection bugs to work, most of the time. From a security point of view that’s a problem. But it would avoid this difference in behavior.

In addition, there’s the decision whether to option names can have spaces in them. When I suggested treating multi-element lists specially, there was a hidden assumption in my suggestion: that they could never have spaces in them. If they can, then you need to keep track of the largest number of spaces in a legal option.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Andreas Leitgeb
Peter da Silva <[hidden email]> wrote:
> On 5/21/17, 5:49 PM, "Mathieu Lafon" <[hidden email]> wrote:
>>> but fails when a dash is present [1]:
>>>    (test) 119 % set list {-value some other thing}
>>>    -value some other thing
>>>    (test) 120 % mylsort -dictionary $list
>>>    wrong # args: should be "mylsort
>>> ?|-ascii|-dictionary|-integer|-real|? list"

The usecase of the lsort/lsearch-like API cannot be solved by any ana-
lysis of argument values. It needs to be done by counting the required
arguments at proc-declaration time, and keeping track of the remaining
required arguments, making sure never to spend more of the call time
arguments on options than leaves enough of them for the required params.

E.g.:
  proc lsort {{cmd -name command} list} { ... }

The count of required arguments is 1, so of the number of args passed to
the call, one is "reserved". The total number of args thus needs to be
at least >= 3 to even consider testing the first arg for being an option.

Once this is implemented, both this "data driven bug" and the EIAS-
violation (that only crept in by a misguided attempt to avoid certain
data-driven conditions) are solved.

PS: my procx implementation does it, but does much more (partially
  covering TIP 288) so its code is even more complicated than would
  be necessary for just supporting lsort/lsearch API.


> In the case of TIP#457, you can choose two possible behaviors:
> 1. Unknown options are treated as strings.
> 2. Unknown options are an error.

I'm glad that mlafon did "2.". (out of laziness I only implemented "1."
in procx.)

> In addition, there?s the decision whether to option names can have
> spaces in them.

While there is no usecase for such options, there is just no point
in even checking for it. Any argument that is put at a position right
after options (and not one of the trailing required params) either
needs the "--" or is eligible to be tested as an option.

Any proc following an ANTI-pattern like the following would require
the caller to use "--" pretty often or get his lists shimmered:
 proc badlsort {{cmd -name command} list {ascOrDesc asc}} { ... }
I guess, Colin will go all *gasp* over this one, but really it
wouldn't be any less so with eatargs or ::faproc.


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Christian Gollwitzer
Am 22.05.17 um 16:55 schrieb Andreas Leitgeb:

> Peter da Silva <[hidden email]> wrote:
>> On 5/21/17, 5:49 PM, "Mathieu Lafon" <[hidden email]> wrote:
>>>> but fails when a dash is present [1]:
>>>>     (test) 119 % set list {-value some other thing}
>>>>     -value some other thing
>>>>     (test) 120 % mylsort -dictionary $list
>>>>     wrong # args: should be "mylsort
>>>> ?|-ascii|-dictionary|-integer|-real|? list"
>
> The usecase of the lsort/lsearch-like API cannot be solved by any ana-
> lysis of argument values. It needs to be done by counting the required
> arguments at proc-declaration time, and keeping track of the remaining
> required arguments, making sure never to spend more of the call time
> arguments on options than leaves enough of them for the required params.
>
> E.g.:
>    proc lsort {{cmd -name command} list} { ... }
>
> The count of required arguments is 1, so of the number of args passed to
> the call, one is "reserved". The total number of args thus needs to be
> at least >= 3 to even consider testing the first arg for being an option.

Thank you very much for clarifying this. I think that it's worth to
implement this argument-counting precisely because it is the most useful
case of mixing named and positional arguments. After thinking about it
again, in my opinion it would be best to error out early if there is the
chance of a data-driven bug.

All-in-all, I am suggesting the following arg binding behaviour:
The general format of the arglist should be

        ?positional? ?named? ?positional? ?args?

Case 1: Unambiguous list of args:
=================================
1) If there is a fixed number of positional args (i.e. non-defaulted) at
the beginning of the arg list, assign the actual args to them *without
inspecting* and drop them from the list

2) If there is a fixed number of positional args at the end of the list,
i.e. no defaulted and no "args", assign the actual args to them and drop
them.

3) Process the remaining args acoording to the named specs. If the last
one is "--", ignore it. If "--" is not the last arg, error WrongNumArgs

Case 2: Ambiguous list of args
===============================
If the list of positional args at the end is non-fixed, i.e. there are
defaulted args, or there is "args", then

1) Assign the fixed-list of positional args from the start as in case 1

2) Process the rest of args as named, like it is now

3) As soon as "--" is seen, switch over to the positional args. If "--"
is not seen and the next supposedly named arg does not start with a
dash, throw an error "Ambiguous arguments, insert -- for end of options"


Case 3: Erroneous arg definition
=================================

If there is a variable number of positional args *before* the named
args, throw an error upon proc definition. I think there is no sensible
way to handle this.

The rules described above allow for the most useful / common ways to mix
named args and positional args. These are:

lsearch ?-options? haystack needle
.c creat rect {1 2 3 4} -fill red -tag xy
(Case 1, no -- required)

rm ?-options? -- file1 file2 ...
(Case 2, unlike rm it throws an error of -- is left off. This prevents
rm $file with somebody setting file to "-rf")

Case 2 prevents future data driven bugs at first invocation, i.e. during
the development of the program. Case 1 removes the necessity to insert
"--" in the well-defined scenarios. Both cases prevent inspection, and
hence shimmering, of data arguments.

I am sorry that I do not have the time to work on the code. I appreciate
the work and your willpower to push this TIP forward, against all odds.

        Christian

PS: Would it be an acceptible solution to put the current implementation
as tcl::unsupported::proc for testing it over the course of a year,
polishing out all bugs which appear in real world code, and moving it to
the core "proc" once it seems stable? I think that most of the
"opponents" are simply scared of the consequences of changing such a
central feature as proc.


>
> Once this is implemented, both this "data driven bug" and the EIAS-
> violation (that only crept in by a misguided attempt to avoid certain
> data-driven conditions) are solved.
>
> PS: my procx implementation does it, but does much more (partially
>    covering TIP 288) so its code is even more complicated than would
>    be necessary for just supporting lsort/lsearch API.
>
>
>> In the case of TIP#457, you can choose two possible behaviors:
>> 1. Unknown options are treated as strings.
>> 2. Unknown options are an error.
>
> I'm glad that mlafon did "2.". (out of laziness I only implemented "1."
> in procx.)
>
>> In addition, there?s the decision whether to option names can have
>> spaces in them.
>
> While there is no usecase for such options, there is just no point
> in even checking for it. Any argument that is put at a position right
> after options (and not one of the trailing required params) either
> needs the "--" or is eligible to be tested as an option.
>
> Any proc following an ANTI-pattern like the following would require
> the caller to use "--" pretty often or get his lists shimmered:
>   proc badlsort {{cmd -name command} list {ascOrDesc asc}} { ... }
> I guess, Colin will go all *gasp* over this one, but really it
> wouldn't be any less so with eatargs or ::faproc.
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Tcl-Core mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/tcl-core
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Alpha Releases

Donald Porter

> PS: Would it be an acceptible solution to put the current implementation
> as tcl::unsupported::proc for testing it over the course of a year,
> polishing out all bugs which appear in real world code, and moving it to
> the core "proc" once it seems stable?

This is called an alpha release.  It is marked by an interpreter where

        [package present Tcl]

returns a version number like:

        8.7a0

All features of an 8.7a* release that are different from an 8.6.* release
are considered to be in a period of testing for people to try over time
until the release of 8.7.0 locks them into place. This often takes at least
a year.

With that established practice already in place, I do not think further
efforts with tcl::unsupported placement are a good idea.  It just becomes
a barrier to actually doing the testing needed.

DGP

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Andreas Leitgeb
In reply to this post by Christian Gollwitzer
Christian Gollwitzer <[hidden email]> wrote:
> After thinking about it again, in my opinion it would be best
> to error out early if there is the chance of a data-driven bug.

After also thinking about it again (since our last chat), I think
that your suggestion is a tad too strict now.

First of all, it is not uncommon for a command/proc to expect
"constrained" strings for certain arguments. Common examples for
such constrained strings are: channel names, widget path names,
tokens such as array names in a fixed namespace (http package), ...

Having a "--" required (by the semantics-agnostic Tcl) even before
semantically constrained parameters is too limiting, imho.
Following that strict logic, puts would always need a "--"  ;-)

> Case 3: Erroneous arg definition
> =================================
> If there is a variable number of positional args *before* the named
> args,

This one is not really ambiguous. Options would only be considered,
if all positional params are assigned, and call arguments remain.

The only thing that really makes no sense are *required* named
arguments following non-required positional ones - in that case,
all preceding defaulted positionals would become de facto required.

> PS: Would it be an acceptible solution to put the current implementation
> as tcl::unsupported::proc for testing it over the course of a year, ...

Here, I agree with Don:
< This is called an alpha release. [...]


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Peter da Silva-2
In reply to this post by Christian Gollwitzer
On 5/22/17, 4:42 PM, "Christian Gollwitzer" <[hidden email]> wrote:
> 3) Process the remaining args acoording to the named specs. If the last one is "--", ignore it. If "--" is not the last arg, error WrongNumArgs

Wording needs to be cleared about cases like “frob -named -- -- positional” where the first “--“ is the literal value of $named.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Mathieu Lafon
In reply to this post by Andreas Leitgeb
On Tue, May 23, 2017 at 2:01 PM, Andreas Leitgeb <[hidden email]> wrote:

> Christian Gollwitzer <[hidden email]> wrote:
>> After thinking about it again, in my opinion it would be best
>> to error out early if there is the chance of a data-driven bug.
>
> After also thinking about it again (since our last chat), I think
> that your suggestion is a tad too strict now.
>
> [...]
>
> Having a "--" required (by the semantics-agnostic Tcl) even before
> semantically constrained parameters is too limiting, imho.
> Following that strict logic, puts would always need a "--"  ;-)

I also think that it is too strict. There are many cases which will
not cause any problem and for which the user will not want to
explicitely add "--".

Users should be responsible enough to know to explicitely use "--" if
a) the next argument starts with a dash
b) the next argument is a variable whose content may start with a dash
c) the next argument can be expensive to stringify

I don't think this is the same than your [file] and [open] example,
which is very specific to a context/platform (and does not seem to be
explicitely mentionned in the doc) and may even impact an experienced
user.

-- Mathieu

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Christian Gollwitzer
Am 23.05.17 um 15:46 schrieb Mathieu Lafon:

> On Tue, May 23, 2017 at 2:01 PM, Andreas Leitgeb <[hidden email]> wrote:
>> Christian Gollwitzer <[hidden email]> wrote:
>>> After thinking about it again, in my opinion it would be best
>>> to error out early if there is the chance of a data-driven bug.
>>
>> After also thinking about it again (since our last chat), I think
>> that your suggestion is a tad too strict now.
>>
>> [...]
>>
>> Having a "--" required (by the semantics-agnostic Tcl) even before
>> semantically constrained parameters is too limiting, imho.
>> Following that strict logic, puts would always need a "--"  ;-)
>
> I also think that it is too strict. There are many cases which will
> not cause any problem and for which the user will not want to
> explicitely add "--".
>
> Users should be responsible enough to know to explicitely use "--" if
> a) the next argument starts with a dash
> b) the next argument is a variable whose content may start with a dash
> c) the next argument can be expensive to stringify

However the decision is made in the end about the issue, I'm not sure if
this:

> I don't think this is the same than your [file] and [open] example,
> which is very specific to a context/platform (and does not seem to be
> explicitely mentionned in the doc) and may even impact an experienced
> user.
is a misunderstanding? The "file" and "open" problems persist on
Windows, MacOS and Linux without difference. On all platforms the
leading ~ is expanded by open/file into "Home directory of user X",
which introduces the bug. Try "glob ~*" on any platform and enjoy the
error message. Or even "glob -nocomplain ~*" :) Just accidentally it
surfaced on Windows because somebody actually created such a file.

Actually I would recommend to remove that ~ facility - which will break
backwards compatibility - because I think it will fix more programs than
destroy existing programs. Now, with "proc", we are in the convenient
situation that nobody relies on behaviour as given, so I am just
reminding to take this into consideration as well.

I am well aware that in the end it'll be a compromise between usability,
versatility, safety and easyness of implementation, so I'm not leaning
towards any exact rules for now. You have done a good job at balancing
these and the various opinions so far, so I'll shut up for now and let
you guys do the work ;)

        Christian

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Mathieu Lafon
Hello,

TIP and implementation have just been updated:

- The special handling of multi-elements-list to end the handling of a
named group has been removed as it could cause coherency issues as
reported ;

- The proposed solution for the unambigous case (fixed number of
arguments after named group) has been implemented but the wording (in
doc and TIP) is different because arguments are always assigned in
order ;

- The "--" end-of-option marker has not been made mandatory but a note
has been added in the doc (proc.n) and a dedicated section has been
added in the Discussion section of the TIP.

Regards.

-- Mathieu Lafon


On Tue, May 23, 2017 at 4:11 PM, Christian Gollwitzer <[hidden email]> wrote:

> Am 23.05.17 um 15:46 schrieb Mathieu Lafon:
>> On Tue, May 23, 2017 at 2:01 PM, Andreas Leitgeb <[hidden email]> wrote:
>>> Christian Gollwitzer <[hidden email]> wrote:
>>>> After thinking about it again, in my opinion it would be best
>>>> to error out early if there is the chance of a data-driven bug.
>>>
>>> After also thinking about it again (since our last chat), I think
>>> that your suggestion is a tad too strict now.
>>>
>>> [...]
>>>
>>> Having a "--" required (by the semantics-agnostic Tcl) even before
>>> semantically constrained parameters is too limiting, imho.
>>> Following that strict logic, puts would always need a "--"  ;-)
>>
>> I also think that it is too strict. There are many cases which will
>> not cause any problem and for which the user will not want to
>> explicitely add "--".
>>
>> Users should be responsible enough to know to explicitely use "--" if
>> a) the next argument starts with a dash
>> b) the next argument is a variable whose content may start with a dash
>> c) the next argument can be expensive to stringify
>
> However the decision is made in the end about the issue, I'm not sure if
> this:
>
>> I don't think this is the same than your [file] and [open] example,
>> which is very specific to a context/platform (and does not seem to be
>> explicitely mentionned in the doc) and may even impact an experienced
>> user.
> is a misunderstanding? The "file" and "open" problems persist on
> Windows, MacOS and Linux without difference. On all platforms the
> leading ~ is expanded by open/file into "Home directory of user X",
> which introduces the bug. Try "glob ~*" on any platform and enjoy the
> error message. Or even "glob -nocomplain ~*" :) Just accidentally it
> surfaced on Windows because somebody actually created such a file.
>
> Actually I would recommend to remove that ~ facility - which will break
> backwards compatibility - because I think it will fix more programs than
> destroy existing programs. Now, with "proc", we are in the convenient
> situation that nobody relies on behaviour as given, so I am just
> reminding to take this into consideration as well.
>
> I am well aware that in the end it'll be a compromise between usability,
> versatility, safety and easyness of implementation, so I'm not leaning
> towards any exact rules for now. You have done a good job at balancing
> these and the various opinions so far, so I'll shut up for now and let
> you guys do the work ;)
>
>         Christian
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Tcl-Core mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/tcl-core

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Christian Gollwitzer
Am 24.05.17 um 02:14 schrieb Mathieu Lafon:
  > TIP and implementation have just been updated:
>
> - The special handling of multi-elements-list to end the handling of a
> named group has been removed as it could cause coherency issues as
> reported ;
>
> - The proposed solution for the unambigous case (fixed number of
> arguments after named group) has been implemented but the wording (in
> doc and TIP) is different because arguments are always assigned in
> order ;

Great! I think that the formulation is very clear and much better than
my lengthy explanations.

> - The "--" end-of-option marker has not been made mandatory but a note
> has been added in the doc (proc.n) and a dedicated section has been
> added in the Discussion section of the TIP.

Maybe that could go into a "lint"-tool like nagelfar. Since the --
switch is always a static string, it can be statically checked against
the arg list.

TIP #457: YES ;)

        Christian


> Regards.
>
> -- Mathieu Lafon
>
>
> On Tue, May 23, 2017 at 4:11 PM, Christian Gollwitzer <[hidden email]> wrote:
>> Am 23.05.17 um 15:46 schrieb Mathieu Lafon:
>>> On Tue, May 23, 2017 at 2:01 PM, Andreas Leitgeb <[hidden email]> wrote:
>>>> Christian Gollwitzer <[hidden email]> wrote:
>>>>> After thinking about it again, in my opinion it would be best
>>>>> to error out early if there is the chance of a data-driven bug.
>>>>
>>>> After also thinking about it again (since our last chat), I think
>>>> that your suggestion is a tad too strict now.
>>>>
>>>> [...]
>>>>
>>>> Having a "--" required (by the semantics-agnostic Tcl) even before
>>>> semantically constrained parameters is too limiting, imho.
>>>> Following that strict logic, puts would always need a "--"  ;-)
>>>
>>> I also think that it is too strict. There are many cases which will
>>> not cause any problem and for which the user will not want to
>>> explicitely add "--".
>>>
>>> Users should be responsible enough to know to explicitely use "--" if
>>> a) the next argument starts with a dash
>>> b) the next argument is a variable whose content may start with a dash
>>> c) the next argument can be expensive to stringify
>>
>> However the decision is made in the end about the issue, I'm not sure if
>> this:
>>
>>> I don't think this is the same than your [file] and [open] example,
>>> which is very specific to a context/platform (and does not seem to be
>>> explicitely mentionned in the doc) and may even impact an experienced
>>> user.
>> is a misunderstanding? The "file" and "open" problems persist on
>> Windows, MacOS and Linux without difference. On all platforms the
>> leading ~ is expanded by open/file into "Home directory of user X",
>> which introduces the bug. Try "glob ~*" on any platform and enjoy the
>> error message. Or even "glob -nocomplain ~*" :) Just accidentally it
>> surfaced on Windows because somebody actually created such a file.
>>
>> Actually I would recommend to remove that ~ facility - which will break
>> backwards compatibility - because I think it will fix more programs than
>> destroy existing programs. Now, with "proc", we are in the convenient
>> situation that nobody relies on behaviour as given, so I am just
>> reminding to take this into consideration as well.
>>
>> I am well aware that in the end it'll be a compromise between usability,
>> versatility, safety and easyness of implementation, so I'm not leaning
>> towards any exact rules for now. You have done a good job at balancing
>> these and the various opinions so far, so I'll shut up for now and let
>> you guys do the work ;)
>>
>>          Christian
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Tcl-Core mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/tcl-core
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Data-dependent bugs in TIP #457

Mathieu Lafon
On Wed, May 24, 2017 at 8:01 AM, Christian Gollwitzer <[hidden email]> wrote:
> Great! I think that the formulation is very clear and much better than
> my lengthy explanations.

Not being a native English speaker, I'm not sure it is as clear as I
would have wanted, but thanks anyway.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

TIP#457 is ill-considered

aspect
In reply to this post by Mathieu Lafon
On 23/05/17 23:16, Mathieu Lafon wrote:
> Users should be responsible enough to know to explicitely use "--" if
> a) the next argument starts with a dash
> b) the next argument is a variable whose content may start with a dash
> c) the next argument can be expensive to stringify

Please, no.  We know from experience that putting a foot-gun in users'
hands does not end well.  [proc] should not encourage programmers to
create fragile interfaces.  Tcl knows all too well the legacy of such
problems.


As has already been mentioned in this thread, deterministic binding can
be achieved by having the variable-length arglist in one position.
There has already been a proposal to enhance proc in this direction:
TIP#288.  With your work as inspiration, I have implemented a version of
this on branch aspect-tip288, taking methodology from a script-level
solution I've been happily using for years.  The implementation binds
args deterministically, without shimmering or breaking EIAS.  Its use
should be entirely obvious to anyone familiar with [proc], without
several pages more documentation are multiple warnings about obscure
risky patterns to avoid.

I suggest that this should be considered as a starting point for proc
enhancement, with the advanced features of TIP#457 layered carefully on
top.  My preference would be for something like [eatargs] built on top
of Tcl_GetIndexFromObj, so that fancyprocs could comfortably resemble
familiar core commands in their behaviour.


Aside from the obvious "foot-gun" described above, I am uncomfortable
with the existing proposal in many ways:

  - it is a radical change to argument binding, implemented in a way
that can only be extended or improved by future TIPs.

  - it does not seem to be informed by practice (Tk, XOTcl, TIP#288,
TIP#65, rl_parseargs, the Tcl_GetIndexFromObj pattern, many script level
implementations)

  - it still commits EIAS violations and unneccessary shimmers

  - it fails the basic test of "can I implement interfaces resembling
core commands?"


Consider [lsearch].  Trying to mimic part of its interface with TIP#457
yields the following:

% proc lsearchy {
       {how -required 0 -default glob -switch {glob exact regexp sorted}}
       {all -switch all -required 0 -default 0}
       {inline -switch inline -required 0 -default 0}
       {not -switch not -required 0 -default 0}
       {start -switch start -required 0 -default 0}
       haystack needle} {
     list how $how all $all inline $inline not $not start $start
}


Compare a couple of misuses:

% lsearch -bad a b
bad option "-bad": must be -all, -ascii, -bisect, -decreasing,
-dictionary, -exact, -glob, -increasing, -index, -inline, -integer,
-nocase, -not, -real, -regexp, -sorted, -start, or -subindices

% lsearchy -bad a b
wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
?|-all|? ?|-inline|? ?|-not|? ?|-start|? haystack needle"


% lsearch -gl a a
0

% lsearchy -gl a a
wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
?|-all|? ?|-inline|? ?|-not|? ?|-start|? haystack needle"


Or valid uses:

% lsearch {-foo -bar -baz} -bar
1

% lsearchy {-foo -bar -baz} -bar
wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
?|-all|? ?|-inline|? ?|-not|? ?|-start|? haystack needle"


The error messages do not resemble core commands that take options,
there is no support for abbreviation and arguments that start with "-"
are verboten ... unless they happen to have list intreps.


There are further EIAS violations and shimmers - rather contrived in
this case, but conceivably problems in real code.

% unset x
% lsearchy [lappend x -all] a a
wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
?|--all|? ?|--inline|? ?|--not|? ?|--start|? haystack needle"

% set x [expr -2]; ::tcl::unsupported::representation $x
value is a int with a refcount of 3, object pointer at 0x24172d8,
internal representation 0xfffffffffffffffe:0x61616161, no string
representation
% lsearchy $x -2
wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
?|--all|? ?|--inline|? ?|--not|? ?|--start|? haystack needle"
% ::tcl::unsupported::representation $x
value is a int with a refcount of 4, object pointer at 0x24172d8,
internal representation 0xfffffffffffffffe:0x61616161, string
representation "-2"


Mathieu I honestly appreciate your hard work and the talent, patience
and boldness that have supported this proposal.  My aspect-tip288 branch
wouldn't exist without it, and many of the ideas discussed in this
thread would remain locked in people's heads or undeveloped.  It's
definitely a good thing TIP#457 happened, but it is not suitable for
inclusion in the core.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: TIP#457 is ill-considered

Mathieu Lafon
Hello,

On Wed, May 24, 2017 at 10:13 AM, aspect <[hidden email]> wrote:

> On 23/05/17 23:16, Mathieu Lafon wrote:
>> Users should be responsible enough to know to explicitely use "--" if
>> a) the next argument starts with a dash
>> b) the next argument is a variable whose content may start with a dash
>> c) the next argument can be expensive to stringify
>
> Please, no.  We know from experience that putting a foot-gun in users'
> hands does not end well.  [proc] should not encourage programmers to
> create fragile interfaces.  Tcl knows all too well the legacy of such
> problems.

I have added the request from Christian in the Discussion section of
the TIP. It's a very small modification and can be decided later
(based on user and TCT feedback) if the TIP is approved.

> As has already been mentioned in this thread, deterministic binding can
> be achieved by having the variable-length arglist in one position.

The latest version (commited a few hours ago and announced on the
list) now handle the case where there is a fixed number of arguments
after the named group.

>   - it is a radical change to argument binding, implemented in a way
> that can only be extended or improved by future TIPs.

Do you mean that it can't be extended dynamically? Having a framework
which can be later extended (with TIP) is IMHO a good point.

>   - it does not seem to be informed by practice (Tk, XOTcl, TIP#288,
> TIP#65, rl_parseargs, the Tcl_GetIndexFromObj pattern, many script level
> implementations)

What does you mean?

>   - it still commits EIAS violations and unneccessary shimmers

Do you refer to the current version? What EIAS violations and
unneccessary shimmers?

> % lsearch -bad a b
> bad option "-bad": must be -all, -ascii, -bisect, -decreasing,
> -dictionary, -exact, -glob, -increasing, -index, -inline, -integer,
> -nocase, -not, -real, -regexp, -sorted, -start, or -subindices
>
> % lsearchy -bad a b
> wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
> ?|-all|? ?|-inline|? ?|-not|? ?|-start|? haystack needle"

It is an automatically generated message for a generic handling of all
proc commands. It is normal that the message is not as specific as a
dedicated message can be.

> % lsearchy -gl a a
> wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
> ?|-all|? ?|-inline|? ?|-not|? ?|-start|? haystack needle"

This is not an issue for me, but a feature request. Abbreviation
handling could be added later if there is a need.

> % lsearchy {-foo -bar -baz} -bar
> wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
> ?|-all|? ?|-inline|? ?|-not|? ?|-start|? haystack needle"

Now works as expected.

> [...] and arguments that start with "-"
> are verboten ... unless they happen to have list intreps.

I don't understand. Can you clarify?

> % unset x
> % lsearchy [lappend x -all] a a
> wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
> ?|--all|? ?|--inline|? ?|--not|? ?|--start|? haystack needle"

Now works as expected.

> % set x [expr -2]; ::tcl::unsupported::representation $x
> value is a int with a refcount of 3, object pointer at 0x24172d8,
> internal representation 0xfffffffffffffffe:0x61616161, no string
> representation
> % lsearchy $x -2
> wrong # args: should be "lsearchy ?|-glob|-exact|-regexp|-sorted|?
> ?|--all|? ?|--inline|? ?|--not|? ?|--start|? haystack needle"

Now works as expected.

-- Mathieu

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Tcl-Core mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tcl-core
Loading...