Block-scoped variable declarations need to be supported

The following is a snippet of original C++ code:

static __declspec(noinline) int ProcessData(const int* data, size_t size, int operationType) {
	int result = 0;

	if (operationType == 1) {
		// Operation 1: Sum
		for (size_t i = 0; i < size; ++i) {
			result += data[i];
		}
	}
	else if (operationType == 2) {
		// Operation 2: Product
		result = 1;
		for (size_t i = 0; i < size; ++i) {
			result *= data[i];
		}
	}
	else {
		// Operation 3: Count even numbers
		for (size_t i = 0; i < size; ++i) {
			if (data[i] % 2 == 0) result++;
		}
	}

	return result;
}

Below is the decompiled output from IDA:

int __cdecl ProcessData(const int *myData, unsigned int size, int operationType)
{
  unsigned int k; // [esp+D0h] [ebp-2Ch]
  unsigned int j; // [esp+DCh] [ebp-20h]
  unsigned int i; // [esp+E8h] [ebp-14h]
  int result; // [esp+F4h] [ebp-8h]

  __CheckForDebuggerJustMyCode(&E2398E0F_ConsoleApplication1_cpp);
  result = 0;
  if ( operationType == 1 )
  {
    for ( i = 0; i < size; ++i )
      result += myData[i];
  }
  else if ( operationType == 2 )
  {
    result = 1;
    for ( j = 0; j < size; ++j )
      result *= myData[j];
  }
  else
  {
    for ( k = 0; k < size; ++k )
    {
      if ( !(myData[k] % 2) )
        ++result;
    }
  }
  return result;
}

There is an issue with IDA’s output here: it converts the original three block-scoped variables, which all shared the name i, into three separate function-scoped variables i, j, and k. How can we make IDA’s output format support the original C++ style of block-scoped variables? If it’s not currently supported, could this feature be added in the future?

Binary Ninja is already close to supporting this. Here is its output:

00412330    int32_t ProcessData(int32_t const* data, uint32_t size, int32_t operationType)

00412330    {
00412330        void var_34;
00412349        __builtin_memset(&var_34, 0xcccccccc, 0x30);
00412350        uint8_t* entry_JMC_flag;
00412350        j_@__CheckForDebuggerJustMyCode@4(entry_JMC_flag);
00412356        int32_t result = 0;
00412356        
00412361        if (operationType == 1)
00412361        {
00412363            for (int32_t i = 0; i < size; i += 1)
0041237b                result += data[i];
00412361        }
00412361        else if (operationType != 2)
00412394        {
004123cb            for (int32_t i_1 = 0; i_1 < size; i_1 += 1)
004123e3            {
004123e3                int32_t edx_6 = data[i_1] & 0x80000001;
004123e3                
004123f4                if (edx_6 < 0)
004123fa                    edx_6 = ((edx_6 - 1) | 0xfffffffe) + 1;
004123fa                
004123fd                if (!edx_6)
00412405                    result += 1;
004123e3            }
00412394        }
00412394        else
00412394        {
00412396            result = 1;
00412396            
004123b5            for (int32_t i_2 = 0; i_2 < size; i_2 += 1)
004123b5                result *= data[i_2];
00412394        }
00412394        
00412418        j___RTC_CheckEsp();
00412420        return result;
00412330    }

In practical reverse engineering, block-scoped variables are highly necessary. For example, in functions compiled with optimizations, a single memory location (like a specific register or stack slot) might be reused across multiple scopes.

For instance, its role in the 1st scope might be isValue, in the 2nd scope it might be index, and in the 3rd scope it might be age. In IDA’s current C89-style pseudocode, we are forced to name this single reused variable something cumbersome like isValue__OR__index__OR__age. Alternatively, we could create a union as an indirect workaround, but unions require manually selecting the correct member for every single code block, which is very tedious.

When functions are inlined, this issue becomes magnified and severely disrupts analysis. Therefore, I sincerely hope the official development team can add support for block-scoped variables in the decompiler.

ā€œForce new variableā€ command is disappeared since version 8.4, after few versions before was broken. Isn’t it?

yes in the new version of hexrays this popup entry was removed
but iam missed with tip
i mean mapped variables

PS
j___RTC_CheckEsp(); It should be removed in new versions of the decompiler, and if it is not removed, then an old decompiler is being used, or perhaps not a full version, for example, cloud from a demo version

@revs Could you please read my question carefully before replying? j___RTC_CheckEsp(); is the output from Binary Ninja, not IDA as you assumed.

Furthermore, features like ā€˜Force new variable’ or ā€˜Map to another variable’ from the article you cited are not automated operations. They require manual analysis and modification, which is completely unrealistic for large files. Moreover, ā€˜Force new variable’ doesn’t even apply to register variables.

Since a relatively young decompiler like Binary Ninja can achieve automatic block-scoped variables, I don’t think this should be a difficult problem for a decompiler like IDA, which has over 30 years of development history. In fact, to me, the implementation seems extremely simple. One post-processing approach could be:

First, traverse the CTree AST:
a. If a variable only appears within a specific scope, place its declaration at the top of that scope rather than at the top of the entire function.
b. If a variable (let’s call it aaa) appears in multiple distinct scopes, but its last usage in each scope is a READ operation (strictly fetching its value, not taking its address), then you can create a new variable for each scope without worrying about whether they share the same underlying address as aaa. Once it leaves its block scope, its lifecycle ends and it becomes invalid, allowing the system to reclaim its memory space for other variables. By restricting the condition to the last operation being a read, we can be certain it won’t be accessed by other code segments within the function, making this transformation safe.

Even taking a step back—if the official developers feel that relying on the final read operation is not robust enough as a deciding factor—in this scenario (one variable being used across multiple independent scopes), having the system automatically create a union for aaa and automatically select different union fields for different scopes would also be acceptable to me.

I’ve tried hrtng ā€œUnite var reuseā€ it often adds garbage fields into the union and requires manual intervention. So automate it will produce mess instead of clean code.
One more unsolvable problem with unions: impossible to select two different union fields on one address in assignments like
unuonA.field1 = unuonB.field2

My vote for repairing and returning back ā€œForce new variableā€ command

Creating a union is merely a fallback solution. If official support for C99-style block-scoped variables were provided, all of these workarounds would be completely unnecessary. Furthermore, as I mentioned above, it is entirely possible to deduce which variables are block-scoped through AST inference.

Could you come up with a real world example of that? Normally data flow analysis would create separate variables if the decompiler can prove that these uses are unrelated to each other. There are some corner cases where it can’t, but they are minority.

I don’t remember removing it, probably it just got renamed to ā€œspilt variableā€

(post deleted by author)

The following code is a perfect reproduction of the isValue__OR__index__OR__age issue I mentioned.

#include <stdio.h>

// Used to force local variables to be allocated on the stack (preventing discrepancies caused by pure register allocation).
// By assigning the address of a local variable to a volatile pointer, the compiler cannot place the variable entirely in registers.
// At the same time, because their lifecycles do not overlap, MSVC is highly inclined to reuse the same stack memory (Stack Slot) for them.
volatile int* g_force_stack_ptr;

// Check if a string is a pure number (no decimals)
int CheckIsValue(const char* str) {
    if (!str || !*str) return 0;
    while (*str) {
        if (*str < '0' || *str > '9') {
            return 0; // Encountered a non-numeric character
        }
        str++;
    }
    return 1;
}

// Get the user's age
int GetUserAge() {
    int age = -1;
    printf("Please enter your age: ");
    // Ignore the return value, just for demo purposes
    (void)scanf_s("%d", &age);
    return age;
}

// Simulate a target function prone to variable reuse (Stack Slot Packing)
void DemoVariableReuse(int argc, char* argv[]) {
    // -------------------------------------------------------------
    // In the three code blocks (Scopes) below, we have three local variables: 
    //   1. isValue
    //   2. index
    //   3. age
    // Not only do their lifecycles not overlap, but we also take their addresses via g_force_stack_ptr.
    // This forces the compiler to allocate them on the stack. And because their lifetimes are staggered,
    // the compiler performs Stack Slot Packing, making them share the same stack address.
    // In decompilers like IDA, this will 100% map to the same stack local variable (e.g., int var_18;),
    // forcing reverse engineers to name it something like isValue__OR__index__OR__age
    // -------------------------------------------------------------

    {
        // Scope 1: Check if it's a value (isValue)
        int isValue = 0;
        g_force_stack_ptr = &isValue; // Force onto stack
        if (argc > 1) {
            isValue = CheckIsValue(argv[1]);
        }
        if (isValue) {
            printf("Argument 1 [ %s ] is a pure numeric value.\n", argv[1]);
        } else if (argc > 1) {
            printf("Argument 1 [ %s ] is not a pure numeric value.\n", argv[1]);
        }
    }

    {
        // Scope 2: Iterate through argument index (index)
        int index = 0;
        g_force_stack_ptr = &index; // Force onto stack
        printf("Command line arguments list:\n");
        for (index = 0; index < argc; ++index) {
            printf("  [%d]: %s\n", index, argv[index]);
        }
    }

    {
        // Scope 3: User's age (age)
        int age = GetUserAge();
        g_force_stack_ptr = &age; // Force onto stack
        if (age >= 0) {
            printf("Obtained age: %d\n", age);
        }
    }
}

int main(int argc, char* argv[]) {
    DemoVariableReuse(argc, argv);
    return *g_force_stack_ptr;
}

If you compile this using VS2022 with the x64 Release configuration, IDA 9.3 SP2 will output the following:

int __fastcall main(int argc, const char **argv, const char **envp)
{
  int argc_3; // ebx
  const char *v6; // rdx
  char v7; // al
  int argc_1; // eax
  int *?g_force_stack_ptr@@3PECHEC; // rax
  int argc_2; // [rsp+20h] [rbp-18h] BYREF
  int argc_4; // [rsp+24h] [rbp-14h] BYREF

  argc_3 = 0;
  argc_2 = 0;
  g_force_stack_ptr = &argc_2;
  if ( argc <= 1 )
  {
    argc_2 = 0;
    g_force_stack_ptr = &argc_2;
    printf("Command line arguments list:\n");
    argc_2 = 0;
    argc_1 = 0;
    if ( argc <= 0 )
      goto LABEL_11;
  }
  else
  {
    v6 = argv[1];
    if ( v6 )
    {
      v7 = *v6;
      if ( *v6 )
      {
        while ( (unsigned __int8)(v7 - 48) <= 9u )
        {
          v7 = *++v6;
          if ( !v7 )
          {
            argc_2 = 1;
            printf("Argument 1 [ %s ] is a pure numeric value.\n", argv[1]);
            goto LABEL_8;
          }
        }
      }
    }
    printf("Argument 1 [ %s ] is not a pure numeric value.\n", argv[1]);
LABEL_8:
    argc_2 = 0;
    g_force_stack_ptr = &argc_2;
    printf("Command line arguments list:\n");
    argc_1 = 0;
    argc_2 = 0;
  }
  do
  {
    printf("  [%d]: %s\n", argc_1, argv[argc_3]);
    argc_1 = argc_2 + 1;
    argc_2 = argc_1;
    argc_3 = argc_1;
  }
  while ( argc_1 < argc );
LABEL_11:
  argc_2 = -1;
  printf("Please enter your age: ");
  scanf_s("%d", &argc_2);
  ?g_force_stack_ptr@@3PECHEC = &argc_4;
  g_force_stack_ptr = &argc_4;
  argc_4 = argc_2;
  if ( argc_2 >= 0 )
  {
    printf("Obtained age: %d\n", argc_2);
    ?g_force_stack_ptr@@3PECHEC = (int *)g_force_stack_ptr;
  }
  return *?g_force_stack_ptr@@3PECHEC;
}

As you can clearly see, argc_2 perfectly demonstrates the confusing isValue__OR__index__OR__age problem I was talking about.